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1 Matrices and matrix algebra 


1.1 Examples of matrices 


A matrix is a rectangular array of numbers and/or variables. For instance 


4 -2 0-3 1 

A=] 5 12 -0.7 «x 38 

n —3 4 6 27 
is a matrix with 3 rows and 5 columns (a 3 x 5 matrix). The 15 entries of the matrix are 
referenced by the row and column in which they sit: the (2,3) entry of A is —0.7. We may 
also write a3 = —0.7, doa = x, etc. We indicate the fact that A is 3 x 5 (this is read as 
*three by five”) by writing A3,.5. Matrices can also be enclosed in square brackets as well as 


large parentheses. That is, both 


2 4 2 4 
and 


1 —6 1 —6 


are perfectly good ways to write this 2 x 2 matrix. 


Real numbers are 1 x 1 matrices. A vector such as 


is a 3 xX 1 matrix. We will generally use upper case Latin letters as symbols for matrices, 


boldface lower case letters for vectors, and ordinary lower case letters for real numbers. 
Definition: Real numbers, when used in matrix computations, are called scalars. 


Matrices are ubiquitous in mathematics and the sciences. Some instances include: 


e Systems of linear algebraic equations (the main subject matter of this course) are 


normally written as simple matrix equations of the form Ax = y. 
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1.2 


The derivative of a function f : R? > R? is a 2 x 3 matrix. 
First order systems of linear differential equations are written in matrix form. 


The symmetry groups of mathematics and physics, which we’ll look at later, are groups 


of matrices. 


Quantum mechanics can be formulated using infinite-dimensional matrices. 


Operations with matrices 


Addition: matrices of the same size can be added or subtracted by adding or subtracting 


the corresponding entries: 


2 1 6 —1.2 8 —0.2 
—3 4 | + 1 x = a—-3 4+2 
7 0 1 —1 8 —1 


Definition: If the matrices A and B have the same size, then their sum is the matrix 
A+ B defined by 
(A + Bg = aij + Dies 


Their difference is the matrix A — B defined by 


(A — Bhi = aij — bi 


Definition: A matrix A can be multiplied by a scalar c to obtain the matrix cA, where 
(cA); = CAj;- 


This is called scalar multiplication. We just multiply each entry of A by c. For example 


4 _ 


Definition: The m x n matrix whose entries are all 0 is denoted 0m», (or, more often, 


just by 0 if the dimensions are obvious from context). It’s called the zero matrix. 


Definition: Two matrices A and B are equal if all their corresponding entries are equal: 


A BS Qi; Dis for all 1, J. 


Definition: If the number of columns of A equals the number of rows of B, then the 
product AB is defined by 
k 
(AB), = So aisbs;. 
s=1 


Here k is the number of columns of A or rows of B. 


Example: 
—-1 0 
2 a 1--142-44+3-1 1-04+2-2+3-3 10 13 
4 2 — — 
-1 0 4 —1--14+0-44+4-1 -1-04+0-2+4+4-3 5. 12 
1 3 


If AB is defined, then the number of rows of AB is the same as the number of rows of 


A, and the number of columns is the same as the number of columns of B: 
Avan rain = (AB en: 


Why define multiplication like this? The answer is that this is the definition that 


corresponds to what shows up in practice. 


Example: Recall from calculus (Exercise!) that if a point (x,y) in the plane 
is rotated counterclockwise about the origin through an angle @ to obtain a 
new point (z’, y’), then 


q! 


xcos@ — ysin#@ 


/ 


Y 


xsin@ + ycos@. 


In matrix notation, this can be written 


£ cos@ —sin@ x 


y sin? —_cos@ y 


If the new point (2’, y’) is now rotated through an additional angle ¢ to get 


(x",y”), then 

gi 7 cos@ —sing x! 

y” 7 sing  cos@d y! 
cos@ —sing cosO —sind i 
7 sin@ cos@ sin@  cos@ y 
7 cos@cos@—sin@sing —(cos@sin ¢ + sin 6 cos ¢) x 
7 cos #sin @ + sin 8 cos @ cos @ cos @ — sin @ sin @ y 

cos(9+¢) —sin(@+ ¢) z 


sin(@+¢) cos(@+¢) y 


This is obviously the correct answer, since it shows that the point has been 
rotated through the total angle of 6+ ¢. The correct answer is given by 


matrix multiplication as we’ve defined it, and not some other way. 


e Matrix multiplication is not commutative: in English, AB ¢ BA, for arbitrary matrices 
A and B. For instance, if A is 3 x 5 and B is 5 x 2, then AB is 3 x 2, but BA is not 
defined. Even if both matrices are square and of the same size, so that both AB and 


BA are defined and have the same size, the two products are not generally equal. 


Exercise: Write down two 2 x 2 matrices and compute both products. Unless you’ve 
been very selective, the two products won’t be equal. Can you think of cases in which 


they are equal? 


Another example: If 


then 


4 
AB= , while BA = (8). 
3 6 


e Two properties of matrix multiplication: 
1. If AB and AC are defined, then A(B+C) = AB+ AC. 
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2. If AB is defined, and c is a scalar, then A(cB) = c(AB). 


(Although we won’t do it here, both these properties can be proven by showing 
that, in each equation, the (i, 7) entry on the right hand side of the equation is 
equal to the (i, 7) entry on the left.) 


Definition: The transpose of the matrix A, denoted A’, is obtained from A by making 
the first row of A into the first column of A‘, the second row of A into the second 


column of A‘, and so on. Formally, 


Cis = Aji. 
So 
t 
1 2 
13 5 
See = 
2 4 6 
5 6 


Here’s a standard consequence of the non-commutatitivity of matrix multiplication: If 


AB is defined, then (AB)' = B'At (not A‘B' as you might expect). 


Example: If 
—1 2 
A= , and B= 
3 4 3 
then 
2. 7 2 -3 
AB= , 80 (AB) = 
—3 6 tT 6 
And 
pia —-1 4 23) {2 -3 
2 3 1 0 7 6 


as advertised. 


Definition: A is square if it has the same number of rows and columns. An important 
instance is the identity matrix I,, which has ones on the main diagonal and zeros 


elsewhere: 


Example: 
Lo ..0 


Z=|.0 10 

00 1 
Often, we'll just write J without the subscript for an identity matrix, when the dimen- 
sion is clear from the context. The identity matrices behave, in some sense, like the 


number 1. If A isn x m, then J,A = A, and AI, = A. 


Definition: Suppose A and B are square matrices of the same dimension, and suppose 
that AB = I = BA. Then B is said to be the inverse of A, and we write this as 


B=A™!. Similarly, B~! = A. For instance, you can easily check that 


2 1 -l 1 O 
11 -1 2 o1]- 
and so these two matrices are inverses of one another: 
=i 54 
2 1 1 -1 1 -l 2 1 
= and — 
1 1 —l 2 —1 2 1 1 


Example: Not every square matrix has an inverse. For instance 


has no inverse. 


Exercise: Show that the matrix A in the above example has no inverse. Hint: Suppose 


that 
a b 


cad 
is the inverse of A. Then we must have BA = I. Write this out and show that the 


equations for the entries of B are inconsistent. 
Exercise: Which 1 x 1 matrices are invertible, and what are their inverses? 


Exercise: Show that if 


a b ; il d —b 
A= , and ad — bc £ 0, then A~* = ——— 
| ad — bc 


If ad — bc = 0, then the matrix is not invertible. You should probably memorize this 


formula. 


2 Matrices and systems of linear equations 


You have all seen systems of linear equations such as 
Sa+4y = 5 (1) 
2r—-y = 0. (2) 
This system can easily be solved: just multiply the 2nd equation by 4, and add the two 
resulting equations to get lla = 5 or x = 5/11. Substituting this into either equation gives 


y = 10/11. In this case, a solution exists (obviously) and is unique (there’s just one solution, 


namely (5/11, 10/11). 


We can write this system as a matrix equation, that is in the form Ax = y. 


3. 4 ie 5 
= (3) 
2 =] y 0 
Here 
aE 
x= , andy = 
Yy 


This works because if we multiply the two matrices on the left, we get the 2 x 1 matrix 


equation 
ov + 4y 5 
2x —Y 0 
And the two matrices are equal if both their entries are equal, which gives us the two 


equations in (1). 


Of course, rewriting the system in matrix form does not, by itself, simplify the way in which 
we solve it. The simplification results from the following observation: the variables 7 and 
y can be eliminated from the computation by simply writing down a matrix in which the 
coefficients of x are in the first column, the coefficients of y in the second, and the right hand 


side of the system is the third column: 


(4) 


We are using the columns as ” place markers” instead of x, y and the = sign. That is, the 
first column consists of the coefficients of x, the second has the coefficients of y, and the 


third has the numbers on the right hand side of (1). 


Definition: The matrix in (2) is called the augmented matrix of the system, and can be 


written in matrix shorthand as (Aly). 


We can do exactly the same operations on this matrix as we did on the original system!: 


3 465 
: Multiply the 2nd eqn by 4 
8 —4 0 
a 4 
: Add the Ist eqn to the 2nd 
11 0 5 
3.4 5 
: Divide the 2nd eqn by 11 
102 


The second equation now reads 1-2 +0-y = 5/11, and we’ve solved for x; we can now 


substitute for x in the first equation to solve for y as above. 


Even though the solution to the system of equations is unique, it can be solved in many 
different ways (all of which, clearly, must give the same answer). Here are two other ways 


to solve it, both using the augmented matrix. As before, start with 


3.45 
2 -1 0 
1 5 5 
Replace eqn 1 with eqn 1 - eqn 2 
2 -1 0 
1 5 5 
Subtract 2 times eqn 1 from eqn 2 
0 —11 —10 
15 5 a 
a Divide eqn 2 by 11 to get y = 10/11 
0O1= 


'The purpose of this lecture is to remind you of the mechanics for solving simple linear systems. We'll 


give precise definitions and statements of the algorithms later. 


Now the second equation tells us that y = 10/11, and we can substitute this into the first 
equation x + 5y = 5 to get x = 5/11. We could even take this one step further: 


102 
: We added -5*eqn 2 to eqn 1 
012 
Now the complete solution can just be read off from the matrix. What we’ve done is to 


eliminate x from the second equation, (the 0 in position (2,1)) and y from the first (the 0 in 


position (1,2)). 


Exercise: What’s wrong with writing the final matrix as 


1 0 0.45 
OF ASI 


Exercise: (Do this BEFORE continuing with the text!) The system we just looked at con- 
sisted of two linear equations in two unknowns. Each equation, by itself, is the equation of 
a line in the plane and so has infinitely many solutions. To solve both equations simultane- 
ously, we need to find the points, if any, which lie on both lines. There are 3 possibilities: 
(a) there’s just one (the usual case), (b) there is no solution (if the two lines are parallel and 


distinct), or (c) there are an infinite number of solutions (if the two lines coincide). 


Given all this food for thought, what are the possibilities for 2 equations in 3 unknowns? 
That is, what geometric object does each equation represent, and what are the possibilities 


for solution(s)? 
Let’s throw another variable into the mix and consider two equations in three unknowns: 
2x—4y+z = 1 (5) 
Aprt+y—z = 3 
Rather than solving this directly, we’ll work with the augmented matrix for the system which 
is 
2 —4 1 1 


4 1-1 8 
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We proceed in more or less the same manner as above - that is, we try to eliminate x from 
the second equation, and y from the first by doing simple operations on the matrix. Before 
we start, observe that each time we do such an ”operation”, we are, in effect, replacing the 
original system of equations by an equivalent system which has the same solutions. For 
instance, if we multiply equation 1 by the number 2, we get a “new” equation 1 which has 
exactly the same solutions as the original. This is also true if we replace, say, equation 2 


with equation 2 plus some multiple of equation 1. (Can you see why?) 


So, to business: 


1-2 ii: 
‘4 4 : Mult eqn 1 by 1/2 
1) 2 
2 3 
Mult eqn 1 by -4 and add it to eqn 2 
0 9 -3 1 
1-2 ii 
: = Mult eqn 2 by 1/9 (6) 
Ss ult eqn 2 by 
0 1-3 9 
10-4 8 
: : Add -2*eqn 2 to eqn 1 (7) 
01-3 4G 


The matrix (4) is called an echelon form of the augmented matrix. The matrix (5) is called 
the reduced echelon form. (Precise definitions of these terms will be given in the next lecture.) 
Either one can be used to solve the system of equations. Working with the echelon form in 
(4), the two equations now read 

g—Qy+2/2 = 1/2 


y—z/3 = 1/9. 
So y = z/3+ 1/9. Substituting this into the first equation gives 


x = Qy—2/241/2 


2(z/3 +1/9) — z/2+1/2 


z/6 + 13/18 
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Exercise: Verify that the reduced echelon matrix (5) gives exactly the same solutions. This 


is as it should be. All ” equivalent” systems of equations have the same solutions. 


We see that for any choice of z, we get a solution to (3). If we take z = 0, then the solution 
is ¢ = 13/18, y = 1/9. But if z = 1, then x = 8/9, y = 4/9 is the solution. Similarly for 
any other choice of z which for this reason is called a free variable. If we write z = t, a more 


familiar expression for the solution is 


t 13 1 13 
r 6+ is 6 is 
v {=| see [oet a f+] a | 8) 
Zz t 1 0 


This is of the form r(t) = tv +a, and you will recognize it as the (vector) parametric form 
of a line in R°. This (with t¢ a free variable) is called the general solution to the system 
(3). If we choose a particular value of t, say t = 37, and substitute into (6), then we have a 


particular solution. 


Exercises: Write down the augmented matrix and solve these. If there are free variables, 
write your answer in the form given in (6) above. Also, give a geometric interpretation of 


the solution set (e.g., the common intersection of three planes in R?.) 


I, 
3sr+2y—4z = 3 
—x—2y+3z = 4 
2. 
2xr—4y = 3 
3sr+2y = —-l 
fy = 10 
3. 


It is now time to put on our mathematician’s hats and think about what we’ve just been 


doing: 


e Can we formalize the algorithm we’ve been using to solve these equations? 


e Can we show that the algorithm always works? That is, are we guaranteed to get all 


the solutions if we use the algorithm? 


To begin with, let’s write down the different ”operations” we’ve been using on the systems 


of equations and on the corresponding augmented matrices: 


1. We can multiply any equation by a non-zero real number (scalar). The corresponding 


matrix operation consists of multiplying a row of the matrix by a scalar. 


2. We can replace any equation by the original equation plus a scalar multiple of another 
equation. Equivalently, we can replace any row of a matrix by that row plus a multiple 


of another row. 


3. We can interchange two equations (or two rows of the augmented matrix); we haven’t 


needed to do this yet, but sometimes it’s necessary, as we'll see in a bit. 


Definition: These three operations are called elementary row operations. 


In the next lecture, we’ll assemble the solution algorithm, and show that it can be reformu- 


lated in terms of matrix multiplication. 
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3 Elementary row operations and their corresponding 


matrices 


As we'll see shortly, each of the 3 elementary row operations can be performed by multiplying 
the augmented matrix (Aly) on the left by what we’ll call an elementary matrix. Just so 


this doesn’t come as a total shock, let’s look at some simple matrix operations: 


e Suppose FA is defined, and suppose the first row of EF is (1,0,0,...,0). Then the first 
row of FA is identical to the first row of A. 


e Similarly, if the i” row of E is all zeros except for a 1 in the i” slot, then the i” row 


of the product FA is identical to the i” row of A. 
e It follows that if we want to change only row i of the matrix A, we should multiply A 
on the left by some matrix EF’ with the following property: 


Every row except row i should be the i‘” row of the corresponding identity matrix. 


The procedure that we illustrate below can (and is) used to reduce any matrix to echelon 


form (not just augmented matrices). 


Example: Let 
o 4 9) 


2. =k YU 


A= 


1. To multiply the first row of A by 1/3, we can multiply A on the left by the elementary 


matrix 
1 
= 0 
E,=| °? 
0 1 
The result is 
1 4 5 
E,A = 33 
2 -1 O 


You should check this on your own. Same with the remaining computations. 
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. To add -2*row1 to row 2 in the resulting matrix, multiply it by 


1. 0 


to obtain 


E 1 0 
3 = ) 

3 

0 -F 

yielding 

4 5 
B3 bh A = e 
012 


. Finally, we clean out the second column by adding (-4/3)row 2 to row 1. We multiply 
by 


obtaining 


0 11 
Ey b3b kA = 


Of course we get the same result as before, so why bother? The answer is that we’re in the 


process of developing an algorithm that will work in the general case. So it’s about time to 


formally identify our goal in the general case. We begin with some definitions. 


Definition: The leading entry of a matrix row is the first non-zero entry in the row, starting 


from the left. A row without a leading entry is a row of zeros. 


Definition: The matrix R is said to be in echelon form provided that 


1. The leading entry of every non-zero row is a 1. 
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2. If the leading entry of row 7 is in position k, and the next row is not a row of zeros, 


then the leading entry of row i+ 1 is in position k + 7, where 7 > 1. 


3. All zero rows are at the bottom of the matrix. 


The following matrices are in echelon form: 


1 x* x O 1 x* x 
»| 001 4,and] 001 x 
00 0 0001 


Here the asterisks (*) stand for any number at all, including 0. 


Definition: The matrix R is said to be in reduced echelon form if (a) R is in echelon form, 
and (b) each leading entry is the only non-zero entry in its column. The reduced echelon 


form of a matrix is also called the Gauss-Jordan form. 


The following matrices are in reduced row echelon form: 


1 *« O x 0 1 0 0 
; 0 0 1 *« |, and 00 1 0 
00 0 0 000 1 


Exercise: Suppose A is 3 x 5. What is the maximum number of leading 1’s that can appear 
when it’s been reduced to echelon form? Same questions for A;,3. Can you generalize your 


results to a statement for A;,x»?. (State it as a theorem.) 


Once a matrix has been brought to echelon form, it can be put into reduced echelon form 


by cleaning out the non-zero entries in any column containing a leading 1. For example, if 


i 2 =1 “3 
R=/]01 204, 
00 O 1 
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which is in echelon form, then it can be reduced to Gauss-Jordan form by adding (-2)row 2 


to row 1, and then (-3)row 3 to row 1. Thus 


1 -—2 0 1 2 -1 3 La =o. 3 

0 1 0 O01 20 )]=] 01 20 

0 O11 00 O01 Oo O14 
and 

1 0 -3 10 -5 3 10-5 0 

01 O G1 2D |)=104 20 

00 41 00 O01 00 O01 


Note that column 3 cannot be ”cleaned out” since there’s no leading 1 there. 


There is one more elementary row operation and corresponding elementary matrix we may 


need. Suppose we want to reduce the following matrix to Gauss-Jordan form 


2 2 -1 
A=) 0 0 3 
1-1 2 


Multiplying row 1 by 1/2, and then adding -row 1 to row 3 leads to 


10 0 + 0 0 2 2-1 1 1s 
EEA = 010 010 0 0 3}4/=!]0 0 8 
tO: 4 001 1-1, 4% 0-2 3 


Now we can clearly do 2 more operations to get a leading 1 in the (2,3) position, and another 
leading 1 in the (3,2) position. But this won’t be in echelon form (why not?) We need to 
interchange rows 2 and 3. This corresponds to changing the order of the equations, and 
evidently doesn’t change the solutions. We can accomplish this by multiplying on the left 


with a matrix obtained from J by interchanging rows 2 and 3: 


100 1 14 1 14 
B3EnFw A=] 0 0 1 0 0 3]/=]0 -2 8 
0 1 0 0-2 3 0 0 8 


Exercise: Without doing any further computation, write down the Gauss-Jordan form for 
this matrix. 
Exercise: Use elementary matrices to reduce 


2 1 
-1 3 


A= 


to Gauss-Jordan form. You should wind up with an expression of the form 


Ex: ++ Bon A=. 


What can you say about the matrix B = E;,--- E,E,? 
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Elementary matrices, continued 


We have identified 3 types of row operations and their corresponding elementary matri- 


ces. If you check the previous examples, you'll find that these matrices are constructed by 


performing the given row operation on the identity matrix: 


I 


4.1 


To multply row,;(A) by the scalar c use the matrix FE obtained from J by multiplying 


7 row of I by c. 


. To add crow;(A) to row;(A), use the identity matrix with its k* row replaced by 


(...,¢,...,1,...). Here ¢ is in position 7 and the 1 is in position k. All other entries 


are 0 


. To interchange rows j and k, use the identity matrix with rows 7 and k interchanged. 


Properties of elementary matrices 


. Elementary matrices are always square. If the operation is to be performed on Ay yn, 


then the elementary matrix E is m x m. So the product FA has the same dimension 


as the original matrix A. 


. Elementary matrices are invertible. If E is elementary, then E~ is the matrix which 


undoes the operation that created E, and E~'EA = JA = A; the matrix followed by 


its inverse does nothing to A: Examples: 


J. 4} 
2) 


adds (—2)row;(A) to row2(A). Its inverse is 


which adds (2)row,(A) to row2(A). 
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e If E multiplies the second row by 5, then 


e If E interchanges two rows, then E = E71. 


Exercises: 


1. If Ais 3x4, what is the elementary matrix that (a) subtracts 7row3(A) from row2(A)?, 


(b) interchanges the first and third rows? (c) multiples row,(A) by 2? 
2. What are the inverses of the matrices in exercise 1? 


3. Do elementary matrices commute? That is, does it matter in which order they’re 


multiplied? Give an example or two to illustrate your answer. 


4.2 The algorithm for Gaussian elimination 


We can now state the algorithm which will reduce any matrix first to row echelon form, and 


then, if needed to reduced echelon form: 


1. Begin with the (1,1) entry. If it’s some a ¥ 0, divide through row 1 by a to get a1 
in the (1,1) position. If it is zero, then interchange row 1 with another row to get a 
nonzero (1, 1) entry and proceed as above. If every entry in column 1 is zero, go to the 
top of column 2 and, by multiplication and permuting rows if necessary, get a 1 in the 
(1,2) slot. If column 2 won’t work, then go to column 3, etc. If you can’t arrange for 
a leading 1 somewhere in row 1, then your original matrix was the zero matrix, and 


it’s already reduced. 


2. You now have a leading 1 in some column. Use this leading 1 and operations of the 


type (a)row;(A) + row;(A) — row;(A) to replace every entry in the column below the 


20 


location of the leading 1 by 0. In other words, the column will now look like 


3. Now move one column to the right, and one row down and attempt to repeat the 


process, getting a leading 1 in this location. You may need to permute this row with 
a row below it. If it’s not possible to get a non-zero entry in this position, move right 
one column and try again. At the end of this second procedure, your matrix might 


look like 


1 *« * x 
0 0 1 x ; 
0 0 0 x 


where the second leading entry is in column 3. Notice that once a leading 1 has 
been installed in the (1,1) position, none of the subsequent row operations will change 
any of the elements in column 1. Similarly, for the matrix above, no subsequent row 


operations in our reduction process will change any of the entries in the first 3 columns. 


. The process continues until there are no more positions for leading entries — we either 
run out of rows or columns or both because the matrix has only a finite number of 


each. We have arrived at the row echelon form. 


The three matrices below are all in row echelon form: 


1 *« x 
1 * *« * x 0 0 1 1 * x 
001 « « f,or} 000 ],or] 01 « (1) 
0001 00 0 001 

000 


Remark: The description of the algorithm doesn’t involve elementary matrices. As a practical 


matter, it’s much simpler to just do the row operation directly on A, instead of writing down 


all 


an elementary matrix and multiplying the matrices. But the fact that we could do this with 


the elementary matrices will turn out to be very useful theoretically. 


Exercise: Find the echelon form for each of the following: 


1 2 

3 4 0 4 3 2-1 4 
o) , (3,4), 

5 6 7 -2 2-5 2 6 

7 8 


4.3. Observations 


e The leading entries progress strictly downward, from left to right. We could just as 
easily have written an algorithm in which the leading entries progress downward as we 


move from right to left. 


e The row echelon form of the matrix is upper triangular: any entry a;; with 1 > 3 


satisfies aj; = 0. 


e To continue the reduction to Gauss-Jordan form, it is only necessary to use each leading 
1 to clean out any remaining non-zero entries in its column. For the first matrix in (1) 
above, the Gauss-Jordan form will look like 

1 *« 0 0 * 
0010 x 
0001 x 


(Of course, cleaning out the columns may lead to changes in the entries labelled with 


4.4 Application to the solution(s) of Ax = y 


Suppose that we have reduced the augmented matrix (A‘y) to either echelon or Gauss-Jordan 


form. Then 
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1. If there is a leading 1 anywhere in the last column, the system Ax = y is inconsistent. 


That is, there is no x which satisfies the system of equations. Why? 


2. If there’s no leading entry in the last column, then at least one solution exists. The 
question then becomes “How many solutions are there?” The answer to this question 


depends on the number of free variables: 


Definition: Suppose the augmented matrix for the linear system Ax = y has been brought 
to echelon form. If there is a leading 1 in any column except the last, then the corresponding 
variable is called a leading variable. For instance, if there’s a leading 1 in column 3, then x3 


is a leading variable. 


Definition: Any variable which is not a leading variable is a free variable. 


Example: Suppose the echelon form of (A:y) is 
13 3 -2 
001 4 


Then the original matrix A is 2 x 3, and if 21,72, and x3 are the variables in the original 


equations, we see that x2; and x3 are leading variables, and #2 is a free variable. 


e If the system is consistent and there are no free variables, then the solution is unique 


— there’s just one. Here’s an example of this: 


So oS So ee 
SS 
i 
* 


e If the system is consistent and there are one or more free variables, then there are 


infinitely many solutions. 


1 * * x 
0 0 1 x 
00 0 0 
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Here 2 is a free variable, and we get a different solution for each of the infinite number 


of ways we could choose 22. 


e Just because there are free variables does not mean that the system is consistent. 


1 * * 
00 1 
0 0 0 


Here 2 is a free variable, but the system is inconsistent because of the leading 1 in 


the last column. There are no solutions to this system. 
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5 Homogeneous systems 


Definition A homogeneous (ho-mo-geen’-ius) system of linear algebraic equations is one in 


which all the numbers on the right hand side are equal to 0: 


Q1121+...+@intn = 0 


Ami ®1 +... + Amntn = 0 


In matrix form, this reads Ax = 0, where A is m x n, 


Xv 
® nx1 


and 0 is n x 1. The homogenous system Ax = 0 always has the solution x = 0. It follows 
that any homogeneous system of equations is alwasy consistent. Any non-zero solutions, if 
they exist, are said to be non-trivial solutions. These may or may not exist. We can find 


out by row reducing the corresponding augmented matrix (A:0). 


Example: Given the augmented matrix 


i 2G =f o 
A=] 8-34 8 |, 
2 2 oO a2 oD 


row reduction leads quickly to the echelon form 


1.2: =1 0 
014 3 0 
000 O 0 


Observe that nothing happened to the last column — row operations don’t do anything 
to a column of zeros. In particular, doing a row operation on a system of homogeneous 
equations doesn’t change the fact that it’s homogeneous. For this reason, when working 


with homogeneous systems, we’ll just use the matrix A. The echelon form of A is 
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1 2 0 =1 
014 3 
000 O 


Here, the leading variables are 7; and x2, while x3 and 4 are free variables, since there are 
no leading entries in the third or fourth columns. Continuing along, we obtain the Gauss- 
Jordan form (You are working out all the details on your scratch paper as we go along, aren’t 


you!?) 


de 0 ee ee 
01 4 3 
00 O 0 


No further simplification is possible; any further row operations will destroy the Guass- 
Jordan structure of the columns with leading entries. The resulting system of equations 


reads 


0 
0, 


v1 8x3 _ T&A 


ty + 4434+ 3x4 


In principle, we’re done in the sense that we have the solution in hand. However, it’s 
customary to rewrite the solution in vector form so that its properties are more clearly 
displayed. First, we solve for the leading variables; everything else goes on the right hand 


side of the equations: 


vy 8x3 + 7X4 


4 1 —4x3 — 3x4. 


Assigning any values we choose to the two free variables x73 and x4 gives us a solution to the 
original homogeneous system. This is, of course, whe the variables are called ” free”. We can 


distinguish the free variables from the leading variables by denoting them as s, t, u, etc. 
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Thus, setting 73 = s, x4 = t, we rewrite the solution in the form 


a, = 8&88s+7t 
tT = —4s— 3t 
is = ¢ 

t%% = t 


Better yet, the solution can also be written in matrix (vector) form as 


Ly 8 7 
v9 —4 —3 
x= = 8 +t (1) 
3 1 0 
4 0 1 


We call (1) the general solution to the homogeneous equation. The notation is misleading, 
since the left hand side x looks like a single vector, while the right hand side clearly represents 
an infinite collection of objects with 2 degrees of freedom. We'll address this later in the 


lecture. 


We won't do it here, but If we were to carry out the above procedure on a general homoge- 


neous system A,X = 0, we’d establish the following facts: 


5.1 Properties of the homogenous system for A,,, 


e The number of leading variables is < min(m, 7). 


e The number of non-zero equations in the echelon form of the system is equal to the 


number of leading entries. 


e The number of free variables plus the number of leading variables = n, the number of 


columns of A. 


e The homogenous system Ax = 0 has non-trivial solutions if and only if there are free 


variables. 
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e If there are more unknowns than equations, the homogeneous system always has non- 
trivial solutions. Why? This is one of the few cases in which we can tell something 


about the solutions without doing any work. 


e A homogeneous system of equations is always consistent (i.e., always has at least one 


solution). 


Exercise: What sort of geometric object does x, represent? 


There are two other fundamental properties: 


1. Theorem: If x is a solution to Ax = 0, then so is cx for any real number c. 


Proof: x is a solution means Ax = 0. But Acx = cAx = cO = O, so « is also a 


solution. 


2. Theorem: If x and y are two solutions to the homogeneous equation, then so is x + y. 


Proof: A(x+y) = Ax+ Ay =0+0=0. 


These two properties constitute the famous principle of superposition which holds for homo- 


geneous systems (but NOT for inhomogeneous ones). 


Definition: if x and y are two vectors and s and t two scalars, then sx + ty is called a linear 


combination of x and y. 


Example: 3x — 47y is a linear combination of x and y. 


We can restate the superposition principle as: 


Superposition principle: if x and y are two solutions to the homogenous equation Ax = 0, 


then any linear combination of x and y is also a solution. 


Remark: This is just a compact way of restating the two properties: If x and y are solutions, 


then by property 1, sx and ty are also solutions. And by property 2, their sum sx + ty is a 
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solution. Conversely, if sx + ty is a solution to the homogeneous equation for all s, t, then 


taking t = 0 gives property 1, and taking s = t = 1 gives property 2. 


You have seen this principle at work in your calculus courses. 


Example: Suppose ¢(z, y) satisfies LaPlace’s equation 


ao Hb _o 
Ox2 Oy? 
We write this as 
oO? oO? 
Ag= 0; WE O Da aaa 


The differential operator A has the same property as matrix multiplication, namely: if 


o(x,y) and w(x, y) are two differentiable functions, and s and t any two real numbers, then 


A(s¢ + tw) = sAgd+4+ tA. 


It follows that if @ and w are two solutions to Laplace’s equation, then any linear combination 
of @ and w is also a solution. The principle of superposition also holds for solutions to the 
wave equation, Maxwell’s equations in free space, and Schrédinger’s equation in quantum 


mechanics. 


Example: Start with ” white” light (e.g., sunlight); it’s a collection of electromagnetic waves 
which satisfy Maxwell’s equations. Pass the light through a prism, obtaining red, orange, 
..., Violet light; these are also solutions to Maxwell’s equations. The original solution (white 
light) is seen to be a superposition of many other solution, corresponding to the various 
different colors. The process can be reversed to obtain white light again by passing the 


different colors of the spectrum through an inverted prism. 


Referring back to the example (see Eqn (1)), if we set 


8 7 
—4 —3 

x= , and y= ; 
0 1 
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then the susperposition principle tells us that any linear combination of x and y is also a 


solution. In fact, these are all of the solutions to this system. 


Definition: We write 


XH = {sx+ty:V real s,t} 


and say that xy is the general solution to the homogeneous system Ax = 0. 
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6 The Inhomogeneous system Ax = y, y 4 0 


Definition: The system Ax = y is inhomogeneous if it’s not homogeneous. 


Mathematicians love definitions like this! It means of course that the vector y is not the zero 


vector. And this means that at least one of the equations has a non-zero right hand side. 


As an example, we can use the same system as in the previous lecture, except we’ll change 


the right hand side to something non-zero: 


t+ 229 =i. = 1 
—22, — 38%) +4%3+5%, = 2. 
2241 + Ax» = 204 = 3 


Those of you with sharp eyes should be able to tell at a glance that this system is inconsistent 
— that is, there are no solutions. Why? We’re going to proceed anyway because this is hardly 


an exceptional situation. 


The augmented matrix is 


1 2 =i 4 
(Aty)=| =2 =38 4 5 2 
, 2) <2 4 


We can’t discard the 5th column here since it’s not zero. The row echelon form of the 


augmented matrix is 


12.0: -=1 1 
014 3 4 
000 O01 


And the reduced echelon form is 


lL - S8 =7 0 
01 4 3 0 
00 0 O 1 
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The third equation, from either of these, now reads 


Ox; + Or + 03 + O24 = 1, or O= 1. 


This is false! How can we wind up with a false statement? The actual reasoning that led us 
here is this: /f the original system has a solution, then performing elementary row operations 
gives us an equivalent system of equations which has the same solution. But this equivalent 
system of equations is inconsistent. It has no solutions; that is no choice of 71,..., x24 satisfies 


the equation. So the original system is also inconsistent. 


In general: If the echelon form of (A:y) has a leading 1 in any position of the last column, 


the system of equations is inconsistent. 


Now it’s not true that any inhomogenous system with the same matrix A is inconsistent. It 


depends completely on the particular y which sits on the right hand side. For instance, if 


then (work this out!) the echelon form of (A:y) is 


20 =f. 7 
014 3 4 
000 0 0 


and the reduced echelon form is 


32 


Since this is consistent, we have, as in the homogeneous case, the leading variables x; and x2, 
and the free variables x3 and x4. Renaming the free variables by s and t, and writing out 


the equations solved for the leading variables gives us 


Ly = 88+ 7t—7 


t = —4s—3t+4 
w3 = 8S 
wt, = t 


This looks like the solution to the homogeneous equation found in the previous section except 
for the additional scalars —7 and + 4 in the first two equations. If we rewrite this using 


vector notation, we get 


LY 8 7 —7 

Hop) —4 —3 4 
x= = 8 +t + 

x3 i] 0 0 

La 0 | 0 


Compare this with the general solution xy to the homogenous equation found before. Once 
again, we have a 2-parameter family of solutions. We can get what is called a particular 
solution by making some specific choice for s and t. For example, taking s = t = 0, we get 


the particular solution 


We can get other particular solutions by making other choices. Observe that the general 
solution to the inhomogeneous system worked out here can be written in the form x = xq+Xp. 
In fact, this is true in general: 


Theorem: Let x, and y, be two solutions to Ax = y. Then their difference x, — y, is a 
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solution to the homogeneous equation Ax = 0. The general solution to Ax = y can be 


written as x, +x; where x, denotes the general solution to the homogeneous system. 


Proof: Since x, and y, are solutions, we have A(x, — y,) = Ax, — Ay» = y-—y = 90. So 
their difference solves the homogeneous equation. Conversely, given a particular solution 
X,, then the entire set x, + x, consists of solutions to Ax = y: if z belongs to x;, then 


A(x, +z) = Ax, + Az =y+0=y and so x, +z is a solution to Ax = y. 


Going back to the example, suppose we write the general solution to Ax = y in the vector 


form 


X = SV; + tV2 + Xp, 


where 


8 f —7 
—4 —3 4 
v= , v= , and Xp) = 
1 0 0 
0 1 0 


Now we e can get another particular solution to the system by taking s = 1, t = 1. This 


gives 


Yp — 


We can rewrite the general solution as 


x = (s—14+1)vi+@¢-14+1)vo+x, 


(@—))vit (= lve yp 
= $v,+tvot Yp 
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As § and t¢ run over all possible pairs of real numbers we get exactly the same set of solutions 
as before. So the general solution can be written as y, + Xp as well as x, + x,! This is a bit 
confusing until you remember that these are sets of solutions, rather than single solutions; 
(8,f) and (s,t) are just different sets of coordinates. But running through either set of 


coordinates (or parameters) produces the same set. 


Remarks 


e Those of you taking a course in differential equations will encounter a similar situation: 
the general solution to a linear differential equation has the form y = y, + y,, where 
Yp iS any particular solution to the DE, and y,, denotes the set of all solutions to the 


homogeneous DE. 


Figure 1: The lower plane (the one passing through 0) 
represents xy. Given the particular solution x, and a 
Zz in Xy, we get another solution to the inhomogeneous 
equation. As z varies in xy, we get all the solutions to 


Ax=y. 


e We can visualize the general solutions to the homogeneous and inhomogeneous equa- 


tions we’ve worked out in detail as follows. The set xy is a 2-plane in R* which goes 


through the origin since x = 0 is a solution. The general solution to Ax = y is ob- 
tained by adding the vector x, to every point in this 2-plane. Geometrically, this gives 


another 2-plane parallel to the first, but not containing the origin (since x = 0 is not 
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a solution to Ax = y unless y = 0). Now pick any point in this parallel 2-plane and 
add to it all the vectors in the 2-plane corresponding to x,. What do you get? You 
get the same parallel 2-plane! This is why x, + X, = Yp + Xn. 
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7 Square matrices, inverses and related matters 


Square matrices are the only matrices that can have inverses, and for this reason, they are 


a bit special. 


In a system of linear algebraic equations, if the number of equations equals the number of 
unknowns, then the associated coefficient matrix A is square. Suppose we row reduce A to 


its Gauss-Jordan form. There are two possible outcomes: 


1. The Guass-Jordan form for Any» is the n x n identity matrix [,, (commonly written 


as just J). 


2. The Gauss-Jordan form for A has at least one row of zeros. 


The second case is clear: The GJ form of A,,., can have at most n leading entries. If the 
GJ form of A is not J, then the GJ form has n — 1 or fewer leading entries, and therefore 


has a row of zeros. 


In the first case, we can show that A is invertible. To see this, remember that A is reduced 
to GJ form by multiplication on the left by a finite number of elementary matrices. If the 


GJ form is J, then we have an expression like 
Ey by... Lob, A= TI, 


where E; is the matrix corresponding to the i*” row operation used in the reduction. If we 
set B = E,E,_1... EE , then clearly BA = I and so B = A™!. Furthermore, multiplying 
BA on the left by (note the order!!!) E,', then by E,!,, and continuing to E;', we undo 


all the row operations that brought A to GJ form, and we get back A: 


(BES i Gee DA = VE ee ke! or 
(OPES a Ee ae, peace A = Cy Ee +52 ) 
iA = eS ke 
A= (En. Bee) 
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We summarize this in a 


Theorem: The following are equivalent (i.e., each of the statements below implies and is 


implied by any of the others) 


e The square matrix A is invertible. 


e The Gauss-Jordan or reduced echelon form of A is the identity matrix. 


e A can be written as a product of elementary matrices 


Example - (fill in the details on your scratch paper) 


We start with 
2 1 
Be 
1 2 


We multiply row 1 by 1/2 using the matrix E): 


1 1 
1 g 1 i 
E,A=| ? A= ° 
0 4 1. 2 
We now add -(row 1) to row 2, using EF»: 
1 0 i 2 i 
En EA aa : = : . 
-1 1 1 2 0 3 
Now multiply the second row by 2: 
0 1 ¢ 1 3 
E3b kA = = ' 
0 3 0 3 0 1 


And finally, add —$(row 2) to row 1: 


Ey b3b 2k, A = 


So 


jon) 

Re dlr 

jon) _ 

Re NF 

| 

ih ee —ee 
. jon) _ 

ee jon) 
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Exercises: 


tol 


Check the last expression by multiplying the elementary matrices. 
Write A as the product of elementary matrices. 


The individual factors in the product of A~! are not unique. They depend on how we 
do the row reduction. Find another factorization of A~!. (Hint: Start things out a 


different way, for example by adding -(row 2) to row 1.) 


Let 
1 1 


2 3 


A= 


Express both A and A7! as the products of elementary matrices. 


Solutions to Ax = y when A is square 


If A is invertible, then the equation Ax = y has the unique solution A~'y for any right 
hand side y. For, 


Axay 2 At Axa Ay = x=s'y. 


In this case, the solution to the homogeneous equation is also unique - it’s the trivial 


solution. 


If A is not invertible, then there is at least one free variable. So there are non- trivial 
solutions to Ax = 0. If y # 0, then either Ax = y is inconsistent (the most likely 


case) or solutions to the system exist, but there are infinitely many. 


Exercise: If the square matrix A is not invertible, why is it “likely” that the inhomo- 
geneous equation is inconsistent? “Likely”, in this case, means that the system should 


be inconsistent for a y chosen at random. 
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7.2 An algorithm for constructing A™! 


The work we’ve just done leads us immediately to an algorithm for constructing the inverse 
of A. (You’ve probably seen this before, but now you'll know why it works!). It’s based on 
the following observation: suppose B,,,, is another matrix with the same number of rows as 
Anxn, and Enyn is an elementary matrix which can multiply A on the left. Then EF can also 


multiply B on the left, and if we form the partitioned matrix 
CHA es, 
Then, in what should be an obvious notation, we have 
BC = (EAE B) serio, 


where FA isnxnand EB is nx p. (Exercise: Check this for yourself with a simple example. 


Better yet, prove it in general.) 


The algorithm consists of forming the partitioned matrix C = (A:/), and doing the row 
operations that reduce A to Gauss-Jordan form on the larger matrix C’. If A is invertible, 
we'll end up with 
Ey...Fx(AiD) = (Ey... BE, ALE,... ED 
= (a5 . 

In words: the same sequence of row operations that reduces A to J will convert I to Aw. 
The advantage to doing things this way is that you don’t have to write down the elementary 
matrices. They’re working away in the background, as we know from the theory, but if all 


we want is A~', then we don’t need them explicitly; we just do the row operations. 


Example: 
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Then row reducing (A:/), we get 


(A:D) = 


mor 
Ww © 
| 
reo 
So © 
oO 
- & 


= 
Ss 
| 
an 
ra) 
= 
ra) 


rl «= r2 


Noo 
w wo 
ew 
oH 
o 2 
Fr oO 


jon) jon) 
w 

w 

S 

| 

i) 

rae 


10-1 0 10 
do column 2 ji 1 8 i —} 0 
00 -3 -% -¢ 1 
loo $ £43 
and column 3 010 —4 —2 2 
p01 4 b- 


1 0 -1 0 1 0 
do col 1 2 4 1-1 0 


So, 


a 
MP 
I 
| 
NIF wl ple 
| 
ale alo oIN 
wh wl wle 


Exercise: Write down a 2 x 2 matrix and do this yourself. Same with a 3 x 3 matrix. 
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8 Square matrices continued: Determinants 


8.1 Introduction 


Determinants give us important information about square matrices, and, as we’ll see in the 
next lecture, are essential for the computation of eigenvalues. You have seen determinants 


in your precalculus courses. For a 2 x 2 matrix 


a bd 
A= : 
cad 


the formula reads 


det(A) = ad — be. 


For a 3 x 3 matrix 


431 432 433 
life is more complicated. Here the formula reads 
det (A) = 011422033 + 413021032 + 412423431 — 412421433 — G11 423432 — 413422431. 


Things get worse quickly as the dimension increases. For an n x n matrix A, the expression 
for det(A) has n factorial = n! = 1-2-...(m—1)-n terms, each of which is a product of n 
matrix entries. Even on a computer, calculating the determinant of a 10 x 10 matrix using 
this sort of formula would be unnecessarily time-consuming, and doing a 1000 x 1000 matrix 


would take years! 


Fortunately, as we'll see below, computing the determinant is easy if the matrix happens to 
be in echelon form. You just need to do a little bookkeepping on the side as you reduce the 


matrix to echelon form. 
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8.2 The definition of det(A) 


Let A be n x n, and write r, for the first row, ro for the second row, etc. 
The determinant of A is a real-valued function of the rows of A which we write as 
det (A) = det(r1,Ye,---,In)- 


It is completely determined by the following four properties: 


1. Multiplying a row by the constant c multiplies the determinant by c: 


det(?i,%o..25 08 es. ty) = CUP, P5424 Tessa hn) 


2. Ifrow7zis the sum of r; and y;, then the determinant is the sum of the two corresponding 


determinants: 
det(r1,To,---,Ti t+ Yi,---;n) = det( Ti, Po, os 4 Piece hn) + det(ri,¥o,---,Yi,;---5Tn) 


(These two properties are summarized by saying that the determinant is a linear func- 


tion of each row.) 


3. Interchanging any two rows of the matrix changes the sign of the determinant: 


Mellin aing he jas) — SU eh gg Pes es) 


4. The determinant of the n x n identity matrix is 1. 


8.3 Some consequences of the definition 


e If A has a row of zeros, then det(A) = 0: Because if A = (...,0,...), then A also = 
(...,cO,...) for any c, and therefore, det(A) = cdet(A) for any c (property 1). This 
can only happen if det(A) = 0. 
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e Ifr; =r;, 1 4 7, then det(A) = 0: Because then det(A) = det(...,rj;,...,17;,-.-) = 
—det(...,17j,..-,T%,---), by property 3, so det(A) = — det(A) which means det (A) = 0. 


e If B is obtained from A by replacing row i with row i+c(row 7), then det(B) = det(A): 


det(B) = det 


The second determinant vanishes because both the i‘” and j rows are equal to rj. 
e Theorem: The determinant of an upper or lower triangular matrix with non-zero entries 
on the main diagonal is equal to the product of the entries on the main diagonal. 


Proof: Suppose A is upper triangular. This means all the entries beneath the main 
diagonal are zero. This means we can clean out each column above the diagonal by 
using a row operation of the type just considered above. The end result is a matrix 
with the original non zero entries on the main diagonal and zeros elsewhere. Then 


repeated use of property 1 gives the result. 


Remark: This is the property we use to compute determinants, because, as we know, 


row reduction leads to an upper triangular matrix. 


Exercise: If A is an upper triangular matrix with one or more Os on the main diagonal, 


then det(A) = 0. 


Examples 


1. Let 
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Note that row 1 = (2,1) = 2(1,$), so that 


1 1 
det(A) = 2det 
3 —4 
1 1 
= saa ( 
0 -F 
i 
= (2)(—+) det 
0 1 


Exercise: Justify each of the above steps. 


2. We can derive the formula for a 2 x 2 determinant in the same way: Let 


a bd 
A= 
c ad 
Then 
1 4 
det(A) = adet ‘ 
c ad 
1 b 
= det i 
0 d—* 
= a(d—®) =ad—be 
Exercises:: 


e Suppose a = 0 in the matrix A. Then we can’t divide by a and the above computation 


won’t work. Show that it’s still true that det(A) = ad — bc. 
e Show that the three types of elementary matrices all have nonzero determinants. 


e Suppose that row;(A) is a linear combination of rows i and 7, where i 4 7 # k: So 


r;, = ar; + br;. Show that det(A) = 0. 
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There are two other important properties of the determinant, which we won’t prove here 


(you can find the proofs in more advanced linear algebra texts): 


e The determinant of A is the same as that of its transpose A’. 
e If A and B are square matrices of the same size, then 


det(AB) = det(A) det(B) 


From the second of these, it follows that if A is invertible, then det(AA7!) = det(J) =1 = 
det(A) det(A~'), so det(A~') = 1/ det(A). 


Definition: If the (square) matrix A is invertible, then A is said to be non-singular. Other- 


wise, A is singular. 


Exercises: 


e Show that A is invertible <= det(A) 4 0. (Hint: use the properties of determinants 


together with the theorem on GJ form and existence of the inverse.) 


e Ais singular <> the homogeneous equation Ax = 0 has nontrivial solutions. (Hint: 
If you don’t want to do this directly, make an argument that this statement is logically 
equivalent to: A is non-singular <= the homogeneous equation has only the trivial 


solution.) 


e Compute the determinants of the following matrices using the properties of the deter- 


minant; justify your work: 


1 2 -3 0 

12 8 1 0 0 
2 6 0 1 

1 0 -1 |, , and nr 4 O 
1 4 3 1 

23 #1 3.7 ~5 
24 6 8 
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9 The derivative as a linear transformation 


9.1 Redefining the derivative 


Matrices appear in many different contexts in mathematics, not just when we need to solve 
a system of linear equations. An important instance is linear approximation. Recall from 
your calculus course that a differentiable function f can be expanded about any point a in 


its domain using Taylor’s theorem. We can write 


f(a) = fla) + f'(a)(@— a) + (2 — a)’, 


f"o) 
21 


where c is some point between x and a. The remainder term £ aC (2 — a)? can be thought of 


as the “error” made by using the linear approximation to f at x = a, 
f(z) = fla) + flax — a). 
In fact, we can write Taylor’s theorem in the more suggestive form 


f(a) = fla) + f@)(@ — a) + €(2, a), 


where the error function e(2,a) has the important property 


etc 


za xry—-a 


= 0. 


(The existence of this limit is another way of saying that the error function “looks like” 


(x —a)’.) 
This observation gives us an alternative (and in fact, much better) definition of the derivative: 


Definition: The real-valued function f is said to be differentiable at x = a if there exists a 


number A and a function e(x,a) such that 
f(@) = f(a) + Ala — a) + €(@, a), 


where 


Theorem: This is equivalent to the usual calculus definition. 


Proof: If the new definition holds, then 
(x) 


fe ee) got 2 ape 


za G—a ta X%—aQA 
and A = f'(a) according to the standard definition. Conversely, if the standard definition of 
differentiability holds, then define € to be the error made in the linear approximation: 
e(x,a) = f(x) — f(a) — f'(a)(x — a). 


Then 
sin (2) — toy LO=LO _ 
CSG: Fp = 30 @r—a DT = GC. 


so f can be written in the new form, with A = f’(a). 


Example: Let f(x) = 4+ 2x — 2, and let a= 1. Then we can get a “linear approximation” 
by taking any number, say 43, and using it for A, writing f(a) = f(1) + 43(@ — 1)+ “error 


term”, where by definition, the error term is what’s left: that is, 
f(x) — f(1) —48(@ — 1) =44 22 — 2? —5 — 43(2 — 1) = 42 — 41 e — 2’. 
But you can see that if we were to define 
e(z) = 42 —41e — 2? (= 42(1 — 2) + 2(1 —2)), 


then 


fi 

fl. = 
which, you will notice, is not 0. The error term, instead of being purely quadratic in x — 1 
(as required by the definition of differentiability), has a linear term: Using Taylor’s theorem 


to expand e(x) about x = 1, we get (exercise) 

€=42—4Alxz —2? = —43(4 — 1) — (x - 1)? 
The only choice for the linear approximation in which the error term is purely quadratic is 
f(x) © fC) + f(C)(# — 1). 
Exercise: Interpret this geometrically in terms of the slope of various lines passing through 
the point (1, f(1)). 
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9.2 Generalization to higher dimensions 


Our new definition of derivative is the one which generalizes to higher dimensions. We start 


with an 


Example: Consider a function from R? to R?, say 


L u(2, y) 2Q+2+4y + 4x? + Say — y? 
y u(2z, y) 1—2+ 2y — 2x? + Bary + y? 


By inspection, as it were, we can separate the right hand side into three parts. We have 


2 
f(0) = 
1 
and the linear part of f is the vector 
x + 4y 
—x% + 2y 
which can be written in matrix form as 
14 £ 
Ax = 
—-1 2 y 


By analogy with the one-dimensional case, we might guess that 
f(x) =f(0) + Ax+ an error term of order 2 in z, y. 


where A is the matrix 


Ou Ou 
Ox Oy 

A=| 9, do | (00) 
Ox Oy 


And this suggests the following 
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Definition: A function f : R" — R™ is said to be differentiable at the point x = a € R” if 


there exists an m x n matrix A and a function e(x,a) such that 
f(x) = f(a) + A(x —a) +(x, a), 


where 
€ 


lim —— = 
\|x—al|o ||x — al| 


The matrix A is called the derivative of f at x =a, and is denoted by Df(a). 


Generalizing the one-dimensional case, it can be shown that if 


u(x) 


is differentiable at x = a, then the derivative of f is given by the m x n matrix of partial 


derivatives oe Duy 
Df(a) = : : > | (a). 

OUm OUm 
Ory _ Oty mxn 


Conversely, if all the indicated partial derivatives exist and are continuous at x = a, then 


the approximation 


f(a) » f(a) + Df(a)(x — a) 


is accurate to the second order in x — a. 


Exercise: Find the derivative of the function f : R? > R® at a = (1,2), where 
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10 Subspaces 


Definitions: 
e A linear combination of the vectors v1,Vo,...,Vm is any vector of the form c,v; + 
CoVg +... +CmVm, Where c1,...,Cm € R. 


e A subset V of R” is a subspace if, whenever v1, v2 € V, and c,, and co are any real 


numbers, the linear combination c,v1 + cove € V. 


Remark: Suppose that V is a subspace, and that x1, X2,...,Xm all belong to V. Then 
C1X1+C2X_ € V. Therefore, (cx, + c2x2)+c3x3 € V. Similarly, (¢.x1+...Cm—1Xm—1) + 


CmXm € V. We say that the subspace V is closed under linear combinations. 


Examples: 


e The set of all solutions to the homogeneous equation Ax = O is a subspace of R” if A 


ism Xn. 


Proof: Suppose x; and x» are solutions; we need to show that cjx; + CX» is also 
a solution. Because x; is a solution, Ax; = 0. Similarly, Ax. = 0. Then for any 
scalars c1,C2, A(cixX1 + C2x2) = Cc, AxX) + cC2AxK2 = c,0 + 0 = 0. So c1x1 + C2x2 is 
also a solution. The set of solutions is closed under linear combinations and so it’s a 


subspace. 


Definition: This important subspace is called the null space of A, and is denoted 


Null(A) 


e The set V of all vectors in R? which are orthogonal (perpendicular) to the vector 


is a subspace of R?. 
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Proof: x is orthogonal to v (x L v) means that x»v = 0. So suppose that x; and x2 
are orthogonal to v. Then, using the properties of the dot product, for any constants 


C1, C2, we have 
(cyxX1 + CoX2)¢v = C1 (X1eV) + Co(Xoev) = c,0 + c20 = O. 
And therefore (cx; + c2x2) L v, so we have a subspace. 


The set consisting of the single vector 0 is a subspace of R” for any n: any linear 
combination of elements of this set is a multiple of 0, and hence equal to O which is in 


the set. 


R” is a subspace of itself since any linear combination of vectors in the set is again in 


the set. 


Take any finite or infinite set S Cc R” 


Definition: The span of S is the set of all finite linear combinations of elements of S: 
span(S)={x:x= So avi, where v; € S, and n < co} 
i=1 


Exercise: Show that span(S) is a subspace of R”. 


Definition; If V = span(S), then the vectors in S are said to span the subspace V. (So 


the word “span” is used in 2 ways.) 


Example: Referring back to the section on solutions to the homogeneous equation, we 


had an example for which the general solution to Ax = 0 took the form 
XH = {sx,+tx2, t,s € R} 


So Xy = span(v1, v2). And, of course, xz = Null(A) is just the null space of the 


matrix A. (We will not use the obscure notation xy for this subspace any longer.) 


How can you tell if a particular vector belongs to span(S)? You have to show that you 


can (or cannot) write it as a linear combination of vectors in S. 
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Example: 


1 
Isv=]| 2 
) 
in the span of 
1 2 
Os =] = {ki Xe}! 
1 2 


Answer: It is if there exist numbers c; and cy such that v = cyx; + coX2. Writing this 


out gives a system of linear equations: 


1 1 2 
v=] 2 =c | 0 | tco] —-1 
a 1 2 
In matrix form, this reads 
1 2 1 
Cy 
0 -l =| 2 
C2 
1 @ 3 


As you can (and should!) verify, this system is inconsistent. No such ci, cg exist. So 


v is not in the span of these two vectors. 


The set of all solutions to the inhomogeneous system Ax = y, y 4 0 is not a subspace. 
To see this, suppose that x, and x2 are two solutions. We’ll have a subspace if any 


linear combination of these two vectors is again a solution. So we compute 


A(c1X1 + C9X2) Cc, AX + Co AX 


= Cy + C2y 


(C1 i C2)Y, 
Since for general cj, cy the right hand side is not equal to y, this is not a subspace. 


NOTE: To show that V is or is not a subspace does not, as a general rule, require any 


prodigious intellectual effort. Just assume that x, x» € V, and see if c)x, + cox2 € V 
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for arbitrary scalars c,, cz. If so, it’s a subspace, otherwise no. The scalars must be 
arbitrary, and x,, x2 must be arbitrary elements of V. (So you can’t pick two of your 
favorite vectors and two of your favorite scalars for this proof - that’s why we always 


use ” generic” elements like x,, and cy.) 


e Definition: Suppose A is m x n. The m rows of A form a subset of R”; the span of 


these vectors is called the row space of the matrix. Similarly, the n columns of A form 


a set of vectors in R™, and the space they span is called the column space of the matrix 


A. 


Example: For the matrix 


1 0 -l 2 
25 -9 7 


the row space of A is span{(1,0,—1, 2)’, (3, 4,6, —1)', (2,5, -9,7)*}*, and the column 


space is 
| 0 —1 2 
span & Wo: | Ae is 6}; =] 
2 5 —9 7 
Exercises: 


e A plane through 0 in R? is a subspace of R°. A plane which does not contain the origin 


is not a subspace. (Hint: what are the equations for these planes?) 


e When is a line in R? a subspace of R?? 


?In many texts, vectors are written as row vectors for typographical reasons (it takes up less space). But 
for computations the vectors should always be written as colums, which is why the symbols for the transpose 


appear here 
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11 Linear dependence and independence 


Definition: A finite set S = {x1, X2,...,Xm} of vectors in R” is said to be linearly dependent 
if there exist scalars (real numbers) cj, Co,...,Cm, not all of which are 0, such that cyx; + 


C9X9 +... + CmXm = O. 


Examples: 


1. The vectors 


1 1 3 
SS | 1. | a. ee =— | =b |, Sudag=—] 1 
1 2 4 


are linearly dependent because 2x; + x2 — x3 = 0. 
2. Any set containing the vector 0 is linearly dependent, because for any c # 0, cO = 0. 


3. In the definition, we require that not all of the scalars cy,...,¢, are 0. The reason for 


this is that otherwise, any set of vectors would be linearly dependent. 


4. If a set of non-zero vectors is linearly dependent, then one of them can be written as a 
linear combination of the others: (We just do this for 3 vectors, but it is true for any 
number). Suppose cx; + C2X_ + c3x3 = O, where at least one of the c's is not zero. If, 


say, Co # 0, then we can solve for Xo: 
X_ = (—1/c2)(crx1 + C3X3). 
And similarly if some other coefficient is not zero. 


5. In principle, it is an easy matter to determine whethera set S is linearly dependent: 
We write down a system of linear algebraic equations and see if there are solutions. 


For instance, suppose 


1 1 1 
S= 21, 0], 1 = {a,b 
1 —l 1 


By the definition, S is linearly dependent <= we can find scalars c,,c2, and c3, not 
all 0, such that 


CyX1 + CoX2 + C3X3 = 0. 


We write this equation out in matrix form: 


1 1 1 Cy 0 
2 0 1 C2 = 0 
1 <1 C3 0 


Evidently, the set S is linearly dependent if and only if there is a non-trivial solution 


to this homogeneous equation. Row reduction of the matrix leads quickly to 


111 
013 
O. G>T 


This matrix is non-singular, so the only solution to the homogeneous equation is the 


trivial one with c, = co = cz = 0. So the vectors are not linearly dependent. 


Definition: the set S is linearly independent if it’s not linearly dependent. 


What could be clearer? The set S is not linearly dependent if, whenever some linear combi- 
nation of the elements of S adds up to O, it turns out that c,,c2,... are all zero. In the last 
example above, we assumed that c,x, + c2X2 + c3x3 = O and were led to the conclusion that 


all the coefficients must be 0. So this set is linearly independent. 


The test” for linear independence is the same as that for linear dependence. We set up a 


homogeneous system of equations, and find out whether or not it has non-trivial solutions 


Exercises: 


1. A set S consisting of two different vectors u and v is linearly dependent <= one of 


the two is a nonzero multiple of the other. (Don’t forget the possibility that one of 
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the vectors could be 0). If neither vector is 0, the vectors are linearly dependent if 


they are parallel. What is the geometric condition for three nonzero vectors in R® to 


be linearly dependent? 


2. Find two linearly independent vectors belonging to the null space of the matrix 


3 2-1 4 
A= 1 0 2 3 
=o =2: o =i 


3. Are the columns of A (above) linearly independent in R°? Why? Are the rows of A 


linearly independent in R*? Why? 


11.1 Elementary row operations 


We can show that elementary row operations performed on a matrix A don’t change the row 


space. We just give the proof for one of the operations; the other two are left as exercises. 


Suppose that, in the matrix A, row;(A) is replaced by row;(A)+c-row,(A). Call the resulting 


matrix B. If x belongs to the row space of A, then 


xX = cyrow,(A) +...+ qrow;(A) + ...+ c¢jrow;(A) + ¢,rowm(A). 


Now add and subtract c- ¢; - row;(A) to get 


xX = cyrow,(A)+...+ Grow;(A) +c: cyrow;(A) +...+ (c; — G+ c)row;(A) + CmroWm(A) 


ejrowy |B) beec + er0w, 8) Fos FG = G * o)rows( B) + acs 4 GatOW al 8): 

This shows that x can also be written as a linear combination of the rows of B. So any 
element in the row space of A is contained in the row space of B. 

Exercise: Show the converse - that any element in the row space of B is contained in the 


row space of A. 
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Definition: Two sets X and Y are equalifX CY andY CX. 


This is what we’ve just shown for the two row spaces. 


Exercises: 


1. Show that the other two elementary row operations don’t change the row space of A. 


2. **Show that when we multiply any matrix A by another matrix B on the left, the rows 


of the product BA are linear combinations of the rows of A. 


3. **Show that when we multiply A on the right by B, that the columns of AB are linear 


combinations of the columns of A 
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12 Basis and dimension of subspaces 


12.1 The concept of basis 


It follows from what we’ve said above that if S = {e,,...,e,,.} spans the subspace V ? but 
is linearly dependent, we can express one of the elements in S' as a linear combination of the 
others. By relabeling if necessary, we suppose that e,, can be written as a linear combination 
of the others. Then 

span(.S') = span(ej,...,@m—1). Why? 


If the remaining m—1 vectors are still linearly dependent, we can repeat the process, writing 


one of them as a linear combination of the remaining m — 2, relabeling, and then 
span(.S') = span(e1,...,@m-—2). 


We can continue this until we arrive finally at a ”minimal” spanning set, say {e1,...,ex}. 


Such a set will be called a basis for V: 


Definition: The set B = {e1,...,e,} is a basis for the subspace V if 


e span(B) =V. 

e B is linearly independent. 
Remark: In definitions like that given above, we really should put ”iff’ (if and only if) instead 
of just ”if’, and that’s the way you should read it. More precisely, if B is a basis, then B 


spans V and is linearly independent. Conversely, if B spans V and is linearly independent, 


then B is a basis. 


Examples: 


3We use the word span in two ways: if V = span.S, then we say that 9 spans the subspace V. 
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In R?, the set 


1 0 0 
B= Ort, IL ss 0 = {€1, 2, 3} 
0 0 1 
is a basis.. 
Why? (a) Any vector 
a 
v=] b 
(e 


in R® can be written as v = ae, + bez + ce3, so B spans R®. And (b): if cyey + c2e@2 + 


C3E3 = 0, then 


Cy 0 
(o>) = 0 ’ 
C3 0 


which means that cj = co = c3 = 0, so the set is linearly independent. 


Definition: The set {e,, 2, e3} is called the standard basis for R°. 


Exercise: Any 4 vectors in R? are linearly dependent and therefore do not form a basis. 


You should be able to supply the argument, which amounts to showing that a certain 


homogeneous system of equations has a nontrivial solution. 
Exercise: No 2 vectors can span R?. Why not? 


If aset B is a basis for R’, then it contains exactly 3 elements. This is a consequence 


of the previous two statements. 


Exercise: Prove that any basis for R” has precisely n elements. 


Example: Find a basis for the null space of the matrix 


LO os 2 
A= | fT oo 1 1 
0012 8 
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Solution: Since A is already in Gauss-Jordan form, we can just write down the general 


solution to the homogeneous equation. These are the elements of the null space of A. 


We have, setting x, = s, and x5 = t, 
vy 
v2 
v3 
4 


U5 


= 39 21 
= —st+t 
= =29: oe 5 
= Ss 
= t 


so the general solution to Ax = 0 is given by Ky = {sv; + tv2 s, t € R}, where 


It is obvious by inspection of the last two entries in each that the set B = {v1, v2} is 


linearly independent. Furthermore, by construction, the set B spans the null space. 


So B is a basis. 


12.2 Dimension 


As we’ve seen above, any basis for R” has precisely n elements. Although we’re not going 


to prove it here, the same property holds for any subspace of IR”: the number of elements in 


any basis for the subspace is the same. Given this, we make the following 


Definition: Let V # {0} be a subspace of 


R” for some n. The dimension of V, written 


dim(V), is the number of elements in any basis of V. 


Examples: 


e dim(R") =n. Why? 
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e For the matrix A above, the dimension of the null space of A is 2. 


e The subspace V = {0} is a bit peculiar: it doesn’t have a basis according to our 
definition, since any subset of V is linearly independent. We extend the definition of 


dimension to this case by defining dim(V) = 0. 


Exercises: 


1. Show that the dimension of the null space of any matrix R in reduced echelon form is 


equal to the number of free variables in the echelon form. 


2. Show that the dimension of the set 
{(z,y, 2) such that 22 — 3y+ z= 0} 


is two. 
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13 The rank-nullity (dimension) theorem 


13.1 Rank and nullity of a matrix 


Definition: The nullity of the matrix A is the dimension of the null space of A, and is denoted 


by N(A). 


Examples: The nullity of J is 0. The nullity of the 3 x 5 matrix considered above is 2. The 


nullity of Opin 1S 7: 


Definition: The rank of the matrix A is the dimension of the row space of A, and is denoted 


R(A) 


Examples: The rank of Inyn is n; the rank of Omxn is 0. The rank of the 3 x 5 matrix 


considered above is 3. 


Definition: The matrix B is said to be row equivalent to A if B can be obtained from A by 
a finite sequence of elementary row operations. In pure matrix terms, this means precisely 
that 

B= Eybp-1+ +: bak A, 


where E),..., Ey are elementary row matrices. We can now establish two important results: 
Theorem: If B is row equivalent to A, then Null(B) =Null(A). 


Proof: Suppose x € Null(A). Then Ax = 0. Since B = E,--- EA, it follows that Bx = 
Ey: ++ Ey, Ax = Ex---E,\0 = 0, so x € Null(B), and therefore that Null(A) C Null(B). 
Conversely, if x € Null(B), then Bx = 0. But B = CA, where C is invertible, being 
the product of elementary matrices. Thus Bx = CAx = 0. Multiplying by C7! gives 
Ax = C~'0 = 0, so x € Null(A), and Null(B) C Null(A). So the two sets are equal, as 


advertised. 


Theorem: If B is row equivalent to A, then the row space of B is identical to that of A 
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Proof: Suppose first that B = EA, where EF is the matrix of some elementary row operation. 
If F interchanges rows, then the span of the new set of rows is the same as the span of the 
old set, so the row space doesn’t change. Similarly if E multiplies one row by a nonzero 
scalar. Finally, if E is the operation corresponding to row;(A) — row;(A) + c- row,(A), then 
the span of the rows of the new matrix is the same as the span of the rows of A (why?). 
Since the theorem is true for any single row operation, it’s true for any finite number of 


them, which completes the proof. 


Summarizing these results: Row operations do not change either the row space or the null 


space of A. 


Corollary 1: If R is the Gauss-Jordan form of A, then R has the same null space and row 


space as A. 

Corollary 2: If B is row equivalent to A, then R(B) = R(A), and N(B) = N(A). 
Exercise: R(A) is equal to the number of leading 1’s in the echelon form of A. 
The following result may be somewhat surprising: 


Theorem: The number of linearly independent rows of the matrix A is equal to the number 
of linearly independent columns of A. In particular, the rank of A is also equal to the number 


of linearly independent columns. 


Proof (sketch): As an example, suppose that columns i, j, and k are linearly independent, 
with 
col;(A) = 2col;(A) — 3col,(A). 


You should be able to convince yourself that doing any row operation on the matrix A 
doesn’t affect this equation. Even though the row operation changes the entries of the 
various columns, it changes them all in the same way, and this equation continues to hold. 
The span of the columns can, and generally will change under row operations (why?), but 


this doesn’t affect the result. 


64 


The actual proof would consist of the following steps: (1) identify a maximal linearly in- 
dependent set of columns of A, (2) argue that this set remains linearly independent if row 
operations are done on A. (3) Then it follows that the number of linearly independent 
columns in the reduced echelon form of A is the same as the number of linearly independent 
columns in A. The number of linearly independent columns of A is then just the number of 
leading entries in the reduced echelon form of A which is, as we know, the same as the rank 


of A. 


13.2 The rank-nullity theorem 


This is also known as the dimension theorem, and version 1 (we’ll see another later in the 


course) goes as follows: 


Theorem: Let A be m x n. Then 
n= N(A) + R(A), 


where n is the number of columns of A. 


Let’s assume, for the moment, that this is true. What good is it? Answer: You can read 
off both the rank and the nullity from the echelon form of the matrix A. Suppose A can be 


row-reduced to 
1 *« * * x 


0 0 1 x* x 

00 0 1 «x 
Then it’s clear (why?) that the dimension of the row space is 3, or equivalently, that the 
dimension of the column space is 3. Since there are 5 columns altogether, the dimension 
theorem says that n = 5 = 3+ N(A), so N(A) = 2. We can therefore expect to find two 


linearly independent solutions to the homogeneous equation Ax = 0. 


Alternatively, inspection of the echelon form of A reveals that there are precisely 2 free 


variables, x2 and x5. So we know that N(A) = 2 (why?), and therefore, rank(A) = 5-2 = 3. 
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Proof of the theorem: This is, at this point, almost trivial. We have shown above that the 
rank of A is the same as the rank of the Gauss-Jordan form of A which is clearly equal to 
the number of leading entries in the Gauss-Jordan form. We also know that the dimension 
of the null space is equal to the number of free variables in the reduced echelon (GJ) form 
of A. And we know further that the number of free variables plus the number of leading 


entries is exactly the number of columns. So 
n= N(A)+ R(A), 


as claimed. 
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14 Change of basis 


When we first set up a problem in mathematics, we normally use the most familiar coordi- 


nates. In R?, this means using the Cartesian coordinates x, y, and z. In vector terms, this 


is equivalent to using what we’ve called the standard basis in R®; that is, we write 


x il 0 0 
y |=2) 0 | +] 1 ise] 0 | = wey + yea + Ze, 
z 0 0 1 


where {€1, 2, e3} is the standard basis. 


But, as you know, for any particular problem, there is often another coordinate system that 
simplifies the problem. For example, to study the motion of a planet around the sun, we put 
the sun at the origin, and use polar or spherical coordinates. This happens in linear algebra 


as well. 


Example: Let’s look at a simple system of two first order linear differential equations 


vy = 341 +22 
(1) 


Lo = 41+ 32% 


Here, we seek functions 2;(t), and x(t) such that both equations hold simultaneously. Now 


there’s no problem solving a single differential equation like 


GC = 32. 


In fact, we can see by inspection that x(t) = ce* is a solution for any scalar c. The difficulty 
with the system (1) is that x, and x2 are ’coupled”, and the two equations must be solved 
simulataneously. There are a number of straightforward ways to solve this system which 
you'll learn when you take a course in differential equations, and we won’t worry about that 


here. 


But there’s also a sneaky way to solve (1) by changing coordinates. We'll do this at the end 


of the lecture. First, we need to see what happens in general when we change the basis. 
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For simplicity, we’re just going to work in R?; generalization to higher dimensions is (really!) 


straightforward. 


Suppose we have a basis {e;,e2} for R?. It doesn’t have to be the standard basis. Then, by 


the definition of basis, any vector v € R? can be written as a linear combination of e, and eg. 
That is, there exist scalars c;, co such that v = c,e; + ceo. 
Definition: The numbers c; and cp are called the coordinates of v in the basis {e1, e2}. And 


Cy 
Ve= 
C2 


is called the coordinate vector of v in the basis {e,, e9}. 
Theorem: The coordinates of the vector v are unique. 


Proof: Suppose there are two sets of coordinates for v. That is, suppose that v = ce, +c2e2, 


and also that v = d,e, + dgeg. Subtracting the two expressions for v gives 
0= (cy _ d,)e; + (co = dy)e@o. 


But {e1, e2} is linearly independent, so the coefficients in this expression must vanish: c, — 


dy = Co — dg = 0. That is, c; = d, and cy = dy, and the coordinates are unique, as claimed. 
Example: Let us use the basis 
{e1, €>} — ) ) 
2 


and suppose 


Then we can find the coordinate vector v, in this basis in the usual way, by solving a system 
of linear equations. We are looking for numbers c; and cz (the coordinates of v in this basis) 


such that 


In matrix form, this reads 


where 


We solve for vz by multiplying both sides by A7!: 


7 3 2 3 19 19/7 
wal vey = (177) = 
21 5 3 -1/7 


Exercise: Find the coordinates of the vector v = (—2, 4)‘ in this basis. 


14.1 Notation 


In this section, we'll develop a compact notation for the above computation that is easy to 


remember. Start with an arbitrary basis {e;,e2} and an arbitrary vector v. We know that 
V=cyey + C2€o, 


where 


is the coordinate vector. We see that the expression for v is a linear combination of two 
column vectors. And we know that such a thing can be obtained by writing down a certain 


matrix product: 


If we define the 2 x 2 matrix E = (e):e2) then the expression for v can be simply written as 


vVv=E-v.. 


Now suppose that {f,,f} is another basis for R?. Then the same vector v can also be 


written uniquely as a linear combination of these vectors. Of course it will have different 
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coordinates, and a different coordinate vector vy. In matrix form, we’ll have 


v= F'-vy. 
Exercise: Let {f,, f2} be given by 
1 il 
iy ed 
If 
3 
y= : 
5 


(same vector as above) find vy and verify that v = F- vy = E- ve. 
Remark: This works just the same in R”, where F = (e;:---:e,) isn x n, and v, is n x 1. 


Continuing along with our examples, since F is a basis, the vectors f; and fj can each be 


written as linear combinations of e; and eg. So there exist scalars a,b, c,d such that 


i) 1 =2 
f; = = da +b 
1 2 3 
| i 1 
1) = =C +d 
—l 2 2 


We won’t worry now about the precise values of a, b, c, d, since you can easily solve for them. 


Definition: The change of basis matrix from E to F is 


a c 


b d 


P= 


Note that this is the transpose of what you might think it should be; this is because we’re 
doing column operations, and it’s the first column of P which takes linear combinations of 
the columns of F and replaces the first column of F with the first column of F’, and so on. 
In matrix form, we have 


PS ihe P 
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and, of course, FE = F- Po}. 
Exercise: Find a, b,c,d and the change of basis matrix from E to F’. 


Given the change of basis matrix, we can figure out everything else we need to know. 


e Suppose v has the known coordinates v, in the basis FE, and F = E- P. Then 
VE Nyg= Fle P WV =F vp 
Remember that the coordinate vector is unique. This means that 
Vvf= Py, 


If P changes the basis from EF to F, then P~' changes the coordinates from v, to vy *. 


Compare this with the example at the end of the first section. 
e For any nonsingular matrix P, the following holds: 
V=E-v.=E-P-P-ve=G-Vq; 


where P is the change of basis matrix from E to G: G = E- P, and P“'-v, =v, are 


the coordinates of the vector v in this basis. 


e This notation is consistent with the standard basis as well. Since 
e, = , and eg = ‘ 


we have EF = In, andv =I,-v 


Remark: When we change from the standard basis to the basis {e;,e2}, the corre- 
sponding matrices are J (for the standard basis) and EF. So according to what’s just 


been shown, the change of basis matrix will be the matrix P which satisfies 
H=ad=P. 


In other words, the change of basis matrix in this case is just the matrix E. 


4Warning: Some texts use P~! instead of P for the change of basis matrix. This is a convention, but you 


need to check. 


vail 


First example, cont’d We can write the system of differential equations in matrix form as 
1 3 
v= v = Ay, ((2) 


where the dot indicates d/dt. We change from the standard basis to F’ via the matrix 


| 1 
i 
1: =] 


Then, according to what we’ve just worked out, we’ll have 
vr= F7'v, and taking derivatives, v= Foy. 
So using v = Fv; and substituting into (2), we find 
Fv; = AFvy, or V7 = F7'AF vs. 


Now an easy computation shows that 


; 4 0 
FVAF= ; 
0 —2 
and in the new coordinates, we have the system 
Uf = Av py 
Ufo = —2vuf2 


In the new coordinates, the system is now decoupled and easily solved to give 
Ufl = Cie 
Uf2 = CQE ”, 


where C1, C2 are arbitrary constants of integration. We can now transform back to the original 


(standard) basis to get the solution in the original coordinates: 


Ul 1 
V9 1 =] C9€ ee” = ce 


t2 


A reasonable question at this point is ” How does one come up with this new basis F’? It 
clearly was not chosen at random. The answer has to do with the eigenvalues and eigenvectors 


of the coefficient matrix of the differential equation, namely the matrix 
A= 
a 1 


All of which brings us to the subject of the next lecture. 
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15 Matrices and Linear transformations 


We have been thinking of matrices in connection with solutions to linear systems of equations 


like Ax = b. It is time to broaden our horizons a bit and start thinking of matrices as 


functions. In particular, if A is m x n, we can use A to define a function f4 from R” to R™ 


which sends v € R” to Av € R™. That is, f4(v) = Av. 


Example: Let 


I 2 3 
Aox3 = 
4 5 6 
If 
x 
v= Yy € R®, 
Zz 
then 
x 
1 2 3 g+2y+3z 
fa(v) = Av= y |= 
4 5 6 4x + dy + 62 
Zz 


sends the vector v € R® to Av € R?. 


Definition: A function f : R" — R™ is said to be linear if 


e f(v, + v2) = f(vi) + f(vo), and 


e f(cv) = cf(v) for all v1, v2 € R” and for all scalars c. 


A linear function f is also known as a linear transformation. 
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Examples: 


e Define f : R? — R by 


ie 
fl y | =82-2y+z. 
Zz 
Then f is linear because for any 
Uy XQ 
Vi= y, |, and vo= yo |> 
Zy 7) 
we have 
Ly + XQ 
fitvea)=f] ywtye | =8(e1+ 22) — 2m + yo) + (a1 +). 
21 + 2 


And the right hand side can be rewritten as (371, — 2y; + 21) + (3% — 2y2 + 22), which 
is the same as f(vi) + f(v2. So the first property holds. So does the second, since 
f (ev) = 3cx — 2cy + cz = c(3u — 2y+ z) =cf(v). 


e Notice that the function f is actually f4 for the right A: if Ai,3 = (3,—2,1), then 
f(v) = Av. 


e If Amn is a matrix, then fy : R” — R’ isa linear transformation because f4(vi+v2) = 
A(v1 + v2) = Avi + Ave = fa(vi) + fa(ve). And A(cv) = cAv => fa(cv) = cfa(v). 


(These are two fundamental properties of matrix multiplication.) 


e Although we don’t give the proof, it can be shown that any linear transformation can 


be written as f, for a suitable matrix A. 


e The derivative (see Lecture 9) is a linear transformation. Df(a) is the linear approxi- 


mation to f(x) — f(a). 


e There are many other examples of linear transformations; some of the most interesting 


ones do not go from R” to R™: 
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1. If f and g are differentiable functions, then 


d df dg d 7 
ae PO ae ae and pl =e : 
Thus the function D(f) = df /dz is linear. 


2. If f is continuous, then we can define 


rH(e)= $6) as, 
and I is linear, by well-known properties of the integral. 
3. The Laplace operator, A, defined before, is linear. 
4. Let y be twice continuously differentiable and define 
L(y) = y" — 2y! — 8y. 
Then L is linear, as you can (and should!) verify. 


Linear transformations acting on functions, like the above, are generally known as linear 
operators. They’re a bit more complicated than matrix multiplication operators, but 


they have the same essential property of linearity. 


Exercises: 


1. Give an example of a function from R? to itself which is not linear. 


2. Identify all the linear transformations from R to R. 


3. If f : R" — R” is linear then 


Ker(f) := {v € R” such that f(v) = 0} 


is a subspace of R”, called the kernel of f. 


4. If f : R" — R” is linear, then 


Range(f) = {y € R™ such that y = f(v) for some v} 


is a subspace of R™ called the range of f. 
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Everything we’ve been doing regarding the solution of linear systems of equations can be 
recast in the framework of linear transformations. In particular, if f4 is multiplication by 
some matrix A, then the range of f, is just the set of all y such that the linear system 
Av = y has a solution (i.e., it’s the column space of A). And the kernel of f,4 is the set of 


all solutions to the homogeneous equation Av = 0. 


15.1 The rank-nullity theorem - version 2 


Recall that for Amxn, we have n = N(A)+R(A). Now think of A as the linear transformation 
f4 : RR” — R™. The domain of f4 is R"; Ker(f,4) is the null space of A, and Range(f,) is the 


column space of A. We can therefore restate the rank-nullity theorem as the 


Dimension theorem: Let f, : RR” — R™. Then 


dim(domain(f,4) = dim(Range(f4)) + dim(Null(f£,)). 


15.2 Choosing a useful basis for A 


We now want to study square matrices, regarding ann xn matrix A as a linear transformation 
from R” to itself. We'll just write Av for f4(v) to simplify the notation, and to keep things 


really simple, we’ll just talk about 2 x 2 matrices — all the problems that exist in higher 


dimensions are present in R?. 


There are several questions that present themselves: 


e Can we visualize the linear transformation x — Ax? One thing we can’t do in general 


is draw a graph! Why not? 


e Connected with the first question is: can we choose a better coordinate system in which 


to view the problem? 
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The answer is not an unequivocal ” yes” to either of these, but we can generally do some 


useful things. 


To pick up at the end of the last lecture, note that when we write f4(v) = y = Av, we are 
actually using the coordinate vector of v in the standard basis. Suppose we change to some 
other basis {e,,e2} using the invertible matrix EF. Then we can rewrite the equation in the 


new coordinates and basis: 


We have v = Ev,, and y = Eye, so 


y = Av 
Ey, = AEv., and 
vy = BAR, 


That is, the matrix equation y = Av is given in the new basis by the equation 
yee lAEV,. 


Definition: The matrix E~'AE will be denoted by A, and called the matrix of the linear 


transformation in the basis E. 


We can now restate the second question: Can we find a nonsingular matrix E so that E~'AE 


is particularly useful? 


Definition: The matrix A is diagonal if the only nonzero entries lie on the main diagonal. 


That is, a,;; = 0 if a Aj. 


Example: 


A= 
0 =2 


is diagonal. This is useful because we can (partially) visualize the linear transformation 
corresponding to multiplication by A: a vector v lying along the first coordinate axis is 
mapped to 4v, a multiple of itself. A vector w lying along the second coordinate axis is 


also mapped to a multiple of itself: Aw = —3w. It’s length is tripled, and its direction is 
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reversed. An arbitrary vector (a,6)’ is a linear combination of the basis vectors, and it’s 


mapped to (4a, —3b)'. 


It turns out that we can find vectors like v and w, which are mapped to multiples of them- 


selves, without first finding the matrix E. This is the subject of the next few sections. 


15.3. Eigenvalues and eigenvectors 


Definitions: If a vector v 4 O satisfies the equation Av = Av, for some real number 4, 
then A is said to be an eigenvalue of the matriz A, and v is said to be an eigenvector of A 


corresponding to X. 


Example: If 
A= , and ¥ = ‘ 
3 1 
then 
5 
Av = =5v 
5 


So A = 5 is an eigenvalue of A, and v an eigenvector corresponding to this eigenvalue. 


Remark: Note that the definition of eigenvector requires that v #4 0. The reason for this is 
that if v = O were allowed, then any number 4 would be an eigenvalue since the statement 
AO = AO holds for any 4. On the other hand, we can have A = 0, and v # O. See the 


exercise below. 


Exercises: 


1. Show that 
1 


=I 


is also an eigenvector of the matrix A above. What’s the eigenvalue? 
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2. Eigenvectors are not unique. Show that if v is an eigenvector for A, then so is cv, for 


any real number c # 0. 


3. Suppose A is an eigenvalue of A. 


Definition: 


E) = {v € R” such that Av = Av} 


is called the eigenspace of A corresponding to the eigenvalue X. 


Show that F) is a subspace of R”. (N.b: the definition of Ey does not require v to be 


an eigenvector of A, so v = 0 is allowed; otherwise, it wouldn’t be a subspace.) 


4. Eo = Ker(f,) is just the null space of the matrix A. 


Example: The matrix 


a 0 -1 cos(7/2) —sin(7/2) 
1 0 sin(7/2) — cos(/2) 
represents a counterclockwise rotation through the angle 7/2. Apart from 0, there is no 


vector which is mapped by A to a multiple of itself. So not every matrix has eigenvectors. 


Exercise: What are the eigenvalues of this matrix? 
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16 Computations with eigenvalues and eigenvectors 


How do we find the eigenvalues and eigenvectors of a matrix A? 


Suppose v # 0 is an eigenvector. Then for some \ € R, Av = Av. Then 


Av—Av = 0, or, equivalently 
(A-—XAI)v = 0. 
So v is a nontrivial solution to the homogeneous system of equations determined by the 
square matrix A—AJ. This can only happen if det(A—AJ) = 0. On the other hand, if Aisa 
real number such that det(A — AJ) = 0, this means exactly that there’s a nontrivial solution 
to (A — AI)v = 0. So 4 is an eigenvalue, and v ¥ 0 is an eigenvector. Summarizing, we 


have the 


Theorem: 2 is an eigenvalue of A if and only if det(A — AJ) = 0. 


For a 2 x 2 matrix 


a b 
A= , 
ed 
we compute 
a—x b ¥ 
det(A — AI) = det = * — (a+ d)A + (ad — bc). 
C d— 


This polynomial of degree 2 is called the characteristic polynomial of the matrix A, and is 
denoted by pa(A). By the above theorem, the eigenvalues of A are just the roots of the 
characteristic polynomial. The equation for the roots, p4(A) = 0, is called the characteristic 


equation of A. 


Example: If 
1 3 
A= 
ol 
Then 
1-A 3 ‘ : 
A-Al= , and pa(A) = (1—A)* —-9 =A — 2A-8B. 
3 1-A 
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This factors as pa(A) = (A — 4)(A + 2), so there are two eigenvalues: A; = 4, and Az = —2. 


We should be able to find an eigenvector for each of these eigenvalues. To do so, we must 
find a nontrivial solution to the corresponding homogeneous equation 
(A — AI)v = 0. For \; = 4, we have the homogeneous system 
1-4 8 = 38 Ly 0 
3 p24 3 +3 x9 0 
This leads to the two equations —32; + 3x2 = 0, and 3x, — 3x2 = 0. Notice that the first 


equation is a multiple of the second, so there’s really only one equation to solve. 


Exercise: What property of the matrix A — AI guarantees that one of these equations will 


be a multiple of the other? 


The general solution to the homogeneous system can be written as 


za ‘ z 
=—c , where c is arbitrary. 
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This one-dimensional subspace of R? is what we called Ey in the last section. 


We get an eigenvector by choosing any nonzero element of Ey. Taking c = 1 gives the 


eigenvector 
1 
V= 
1 
Exercises: 
1. Find the subspace £_» and show that 
1 
V2> 
—1l 
is an eigenvector corresponding to Ay = —2. 


2. Find the eigenvalues and corrsponding eigenvectors of the matrix 


A= 
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3. Same question for the matrix 


16.1 Some observations 


What are the possibilities for the characteristic polynomial p,4? It’s of degree 2, so there are 


3 cases: 


1. The two roots are real and distinct: Ay 4 Az, A1,A2 € R. We just worked out an 


example of this. 


2. The roots are complex conjugates of one another: A; = a+ib, Az = a — ib. 


Example: 
2 3 


-—3 2 


A= 


Here, p4(A) = A? — 44+ 13 = 0 has the two roots Az = 2 + 37. Now there’s certainly 


no real vector v with the property that Av = (2 4+ 32)v, so there are no eigenvectors 
in the usual sense. But there are complex eigenvectors corresponding to the complex 


eigenvalues. For example, if 


A= | 


pa(A) = A? +1 has the complex eigenvalues Az = +7. You can easily check that 


Av = iv, where 


4 
| 


We won’t worry about complex eigenvectors in this course. 


3. pa(A) has a repeated root. An example is 


A= = Ip. 


Here pa(A) = (1 — A)? and \ = 1 is the only eigenvalue. The matrix A — XJ is the 


zero matrix. So there are no restrictions on the components of the eigenvectors. Any 


nonzero vector in R? is an eigenvector corresponding to this eigenvalue. 


But for 


A= ‘ 
a | 


as you saw in the exercise above, we also have pa(A) = (1 — A)?. In this case, though, 


there is just a one-dimensional eigenspace. 


16.2 Diagonalizable matrices 


Example: In the preceding lecture, we showed that, for the matrix 


1. 3 
A= : 
3. <l 
if we change the basis using 
; 1 1 
R= (e1:e2) = 5 
1 -l 
then, in this new basis, we have 
4 0 
Ag=E AE = 
0 -2 


The matrix Ag is called a diagonal matrix; the only non-zero entries lie on the main diagonal. 


Definition: Let A benxn. We say that A is diagonalizable if there exists a basis {e),...,e,} of R”, 


with corresponding change of basis matrix EF = (e,:---:e,) such that 
Ap = EAE 


is diagonal. 
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In the example, our matrix # has the form FE = (e;:e2), where the two columns are two 
eigenvectors of A corresponding to the eigenvalues \ = 4, and \ = 2. In fact, this is the 


general recipe: 


Theorem: The matrix A is diagonalizable <=> there is a basis for R” consisting of eigen- 


vectors of A. 


Proof: Suppose {e;,...,€,} is a basis for R” with the property that Ae; = A;e;, 1 <j <n. 


Form the matrix FE = (e1:e9:---:e,). We have 


AE = (Ae;:Ae:---:Ae,) 
= (A1e1:A2€2: eae ‘An€n) 


= ED, 


where D = Diag(A;, A2,.--,; An). Evidently, Ag = D and A is diagonalizable. Conversely, if 
A is diagonalizable, then the columns of the matrix which diagonalizes A are the required 


basis of eigenvectors. 


So, in R?, a matrix A can be diagonalized <= we can find two linearly independent 


eigenvectors. (To diagonalize a matrix A means to find a matrix EF such that E~'AE is 


diagonal.) 


Examples: 


e Diagonaize the matrix 


1 2 
A= 
a 0 
Solution: From the previous exercise set, we have A; = 3, Ag = —2 with corresponding 
eigenvectors 
1 —2 
vi= » V2a= 
1 Q 
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We form the matrix 


1-2 a a2 
E = (vive) = , with E~* = (1/5) : 
1 5) = od 


and check that E~'AE = Diag(3, —2). Of course, we don’t really need to check: the 


result is guaranteed by the theorem above! 


e The matrix 


A= 
0 1 


has only the one-dimensional eigenspace spanned by the eigenvector 


1 
0 


There is no basis of R? consisting of eigenvectors of A, so this matrix cannot be diago- 


naized. 


This can only happen in the case of repeated or complex roots because of the following 


Theorem: If A; and 2 are distinct eigenvalues of A, with corresponding eigenvectors v1, V2, 


then {v1, V2} are linearly independent. 


Proof: Suppose cyjv; + c2V2 = 0, where one of the coefficients, say c; is nonzero. Then 
V1 = QV», for some a ¥ 0. (If a = 0, then v, = 0 and v, by definition is not an eigenvector.) 


Multiplying both sides on the left by A gives 
Av = ALV1 => aAve => aA2Ve. 


On the other hand, multiplying the same equation by A; and then subtracting the two 
equations gives 


0= a(A, = A1)V2 


which is impossible, since neither a nor (A; — Ag) nor v2 = 0. 
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It follows that if Aoy2 has two distinct eigenvalues, then it has two linearly independent 
eigenvectors and can be diagonalized. In a similar way, if An, has n distinct eigenvalues, it 


is diagonalizable. 


Exercises: 


1. Find the eigenvalues and eigenvectors of the matrix 


A= 
1 3 


Form the matrix E and verify that E~'AE is diagonal. 


2. List the two reasons a matrix may fail to be diagonalizable. Give examples of both 


cases. 
3. An arbitrary 2 x 2 symmetric matrix (A = A‘) has the form 


a bd 
A= 5 
b ¢ 


where a, b,c can be any real numbers. Show that A always has real eigenvalues. When 


are the two eigenvalues equal? 
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17 Inner products 


Up until now, we have only examined the properties of vectors and matrices in R”. But 


normally, when we think of R”, we’re really thinking of n-dimensional Euclidean space - that 


is, R” together with the dot product. Once we have the dot product, or more generally an 


"inner product” on R”, we can talk about angles, lengths, distances, etc. 


Definition: An inner product on R” is a function 


(,):R° xR" -R 


with the following properties: 


1. It is bilinear, meaning it’s linear in each argument: 


@ (cyx1 + CoX2, y) = C1(X1, y) + Co(X2, y), and 


@ (x, C1¥1 + Coy2) = c1(X, ¥1) + €2(x, yo). 


2. It is symmetric: (x, y) = (y,x), Vx,y € R”. 


3. It is non-degenerate: If (x,y) =0,Vy € R", then x = 0. 


The inner product is said to be positive definite if, in addition 


A. (x,x) > 0 whenever x # 0. 


Examples of inner products 


e The dot product in R” given in the standard basis by 
(x,y) = xy = Ty + Layo +--+ + nn 


The dot product is positive definite - all four of the properties above hold (exercise). 


R” with the dot product as an inner product is called n-dimensional Euclidean space, 


ies 


and is denoted E”. 
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e In R*, with coordinates t,x, y, z, we can define 
(V1, V2) = tite — £122 — Y1Y2 — 2122. 


This is an inner product too. But for x = (1,1,0,0)', we have (x,x) = 0, so it’s 


not positive definite. R* with this inner product is called Minkowski space. It is 


the spacetime of special relativity (invented by Einstein in 1905, and made into a 
nice geometric space by Minkowski several year later). It is denoted M7‘, and if time 


permits, we’ll look more closely at this space later in the course. 


e Let G be ann X n symmetric matrix (G = G"), with det(G) 4 0. Define 
(x, y)e = x'Gy. 


It is not difficult to verify that this satisfies the properties in the definition. For 
example, if (x,y)c = x'Gy = 0 Vy, then x'G = 0, because if we write x'G as the 
row vector (a1, @2,...,@n), then x'Ge; = 0 > a, = 0, x'Gep = 0 > az = 0, etc. So 
all the components of x'G are 0 and hence x'G = 0. Now taking transposes, we find 
that G'x = Gx = 0. Since G is nonsingular by definition, this means that x = 0, 
(otherwise the homogeneous system Gx = 0 would have non-trivial solutions and G 


would be singular) and the inner product is non-degenerate. 


In fact, any inner product on R” can be written in this form for a suitable matrix G: 


* xey = x'Gy with G = J. For instance, if 


3 —1 
x=] 2 |], andy= 2 |; 
1 4 
then 
—1 
wey = x'Ty =x'y = (3,2,.1) Q | =-34+44+4=5 
4 


* The Minkowski inner product has the form x'Gy with G = Diag(1, —1, -1, —-1) 
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Remark: If we replace y by x in all of the above, we get what’s called a quadratic form, 
which is a function of just the single vector variable x. Its general form is x'Gx. It’s 


no longer linear in x, but quadratic (hence the name). 


Exercise**: Show that under a change of basis matrix E, the matrix G of the inner product 


becomes Gz = E'GE. For instance, if G = I, so that xey = x'Ty, and 


1 3 10 4 
B= , then xy = x,;Gryz, with Gz = 
3 1 4 10 


This is different from the way in which an ordinary matrix (which can be viewed as a linear 
transformation) behaves. Thus the matrix representing an inner product is a different object 


from that representing a linear transformation. 


17.1 Euclidean space 


We now restrict attention to Euclidean space E”. We'll always be using the dot product, 


whether we write it as xey or (x,y). 


Definition: The norm of the vector x is defined by 


||x|| = V/xex. 


In the standard coordinates, this is equal to 


1/2 
ill = (2) | 
i=l 


ie ee 4 |, then ||x|| = /(—2)?4+ 424122 = V21 


Example: 


90 


Proposition: 


e ||x||>0ifx 40. 


me 


© |jex|] = lel||x||, Ve € 


Proof: Exercise 


As you know, ||x|| is the distance from the origin 0 to the point x. Or it’s the length of the 


vector x. (Same thing.) The next few properties all follow from the law of cosines: 


For a triangle with sides a,b, and c, and angles opposite these sides of A, B, and C, 


c =a? +b? — 2abcos(C). 


This reduces to Pythagoras’ theorem if C' is a right angle, of course. In the present context, 
we imagine two vectors x and y with their ’tails” located at 0. The vector going from the 
tip of x to the tip of y is x — y. If @ is the angle between x and y, then the law of cosines 
reads 


IIx — yl? = Ibxl? + [ly l? — 21 x/|Ily |] cos. (1) 


On the other hand, from the definition of the norm, we have 


IIx—yl|? = (x-y)(x—-y) 
= X*eX — X*y — yx + yey or (2) 
IIx—yl|? = |[xll? + |ly||? — 2xey 


Comparing (1) and (2), we conclude that 


xey 
xey = cos 4||x]| ||y||, or cos@ = ——~— (3) 
llx]} lly 


Since | cos@| < 1, taking absolute values we get 


Picea, 4, ora Tv (4) 


[<I] lly] 
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The inequality (4) is known as the Cauchy-Schwarz inequality. 


Exercises: 


1. Find the angle 6 between the two vectors v = (1,0, —1)' and (2, 1,3). 


2. When does 


xey| = ||x|| ||y||? What is @ when xey = 0? 


Using the Cauchy-Schwarz inequality, we (i-e., you) can prove the triangle inequality: 


Theorem: For all x, y, ||x + y|| < ||x|| + |lyll. 
Proof: Exercise (Expand the dot product ||x+y]|? = (x+y)«(x+y), use the Cauchy-Schwarz 


inequality, and take the square root.) 


Exercise: The triangle inequality as it’s usually encountered in geometry courses states that, 
in AABC, the distance from A to B is < the distance from A to C plus the distance from 
C’ to B. Is this the same thing? 
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18 Orthogonality and related notions 


18.1 Orthogonality 


Definition: Two vectors x and y are said to be orthogonal if xey = 0. (This is the fancy 


version of ” perpendicular” .) 


Examples: The two vectors 


2 
—| and 9 
4 


are orthogonal, since their dot product is (2)(1) + (2)(—1) + (4)(0) = 0. The standard basis 


vectors €1,@2,e3 € R® are mutually orthogonal. That is ee; = 0 whenever i # j. The 


vector 0 is orthogonal to everything. 


Definition: A unit vector is a vector of length 1. If its length is 1, then the square of its 


length is also 1. So v is a unit vector if vev = 1. 


If w is an arbitrary nonzero vector, then a wnit vector in the direction of w is obtained by 
multiplying w by ||w||~!: W = (1/|/w]|)w is a unit vector in the direction of w. The caret 


mark over the vector will always be used to indicate a unit vector. 


Examples: The standard basis vectors are all unit vectors. If 


1 

a 1 i 3 
w= —w= —_ 
IIwl] 4/14 

3 
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The process of replacing a vector w by a unit vector in its direction is called normalizing 


the vector. 


For an arbitrary nonzero vector in R?® 


x 
Yy ’ 
Zz 
the corresponding unit vector is 
x 
1 
af a? + y? + 22 
x 


In physics and engineering courses, this particular vector is often denoted by r. For instance, 
the gravitational force on a particle of mass m sitting at (x,y, z)’ due to a particle of mass 


M sitting at the origin is 
_ —-GMm.. 


2 2) 


F 


r 


where r? = 2? + y? 4+ 2?. 


18.2 Orthonormal bases 


Although we know that any set of n linearly independent vectors in R” can be used as a 


basis, there is a particularly nice collection of bases that we can use in Euclidean space. 


Definition: A basis {v1,V2,...,Vn} of E” is said to be orthonormal if 


1. viev; = 0, whenever i # j. That is, they are mutually orthogonal, and 


2. vjev; = 1 for all 7. They are all unit vectors. 
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Examples: The standard basis is orthonormal. The basis 


is orthogonal, but not orthonormal. We can normalize these vectors to get the orthonormal 


basis 
1/2 1//2 
v2} \ -1/v2 
You may recall that it’s quite tedious to compute the coordinates of a vector w in an arbitrary 


basis. The advantage of using an orthonormal basis is 


Theorem: Let {vi,...,Vn} be an orthonormal basis in E”. Let w € E”. Then 
Ww = (wev,)v1 + (Weva)Vo + +++ + (Wev,) Vn. 


That is, the i” coordinate of w in this basis is given by wev;, the dot product of w with 


the i‘” basis vector. 


Proof: Since we have a basis, there are unique numbers cj,...,¢, such that 
W = CLV + CoV2 + +++ + CnVn- 


Take the dot product of both sides of the equation with v,: using the linearity of the dot 
product, we get 


View = C1(V19V1) + C2(V V2) a ei Cn(Vi°Vn)- 


Since the basis is orthonormal, all these dot products vanish except for the first, and we have 


(view) = c1(vyev,) = cy. An identical argument holds for the general v;. 


Example: Find the coordinates of the vector 


in the basis 


{V1, Va} = 


Solution: wev; = 21/2 = 3/V/2 = —1/V2, and wev2 = 2/V/2 + 3/V2 = 5/V2. So the 


coordinates of w in this basis are 
il —1 


v2\ 5 


Exercises: 


1. Let 
cos 0 —sin 0 


{e1(8), e2(8)} = 


sind } cos 6 
What’s the relation between {e;(6), e2(@)} and {i,j} = {e1(0), e2(0)}? 


2. Let 


Find the coordinates of v in the basis {e1(@), e2(0) } 


e By writing v = c,e;(0) + c2e2(0) and solving for c, co. 


e By using the theorem above. 


18.3. Orthogonal projections 


It is frequently useful to decompose a given vector v as V = vj, + v,, where vj is parallel to 


a vector w, and v_ is orthogonal to w. 


Example: Suppose a mass m is at the end of a rigid, massless rod (a pendulum, approxi- 
mately), and the rod makes an angle @ with the vertical. The force acting on the pendulum 
is the gravitational force —mge. Since the pendulum is rigid, the force directed along the 
rod’s direction doesn’t do anything (i.e., doesn’t cause the pendulum to move). Only the 
force orthogonal to the rod produces motion. The magnitude of the force parallel to the 


pendulum is mgcos§@, and the orthogonal force has magnitude mgsin 6. If the pendulum 
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The pendulum bob makes an angle @ 
with the vertical. The magnitude of the 
force (gravity) acting on the bob is mg. 


The component of the force acting in 
the direction of motion of the pendulum 


mg sin has magnitude mg sin(@). 


mg 
has length J, Newton’s law (F = ma) reads 
ml@ = —mqsin 8, 
or 


6+ Zsind = 0. 


This is the differential equation for the motion of the pendulum. For small angles, we have, 


approximately, sin@ + 6, and the equation can be linearized to give 
6+ w? =0, where w = rE 
18.4 Algorithm for the decomposition 


We want to write v = vj, +Vv_, where vj is in the direction of w. See the figure. Suppose 6 


is the angle between w and v. We assume for the moment that 0 < 6 < 7/2. Then 


Ma vew 
IIvy | = [Iv] | cos = [Iv] (] _ 
Te er) = Ter 
or 
||vj[] = ve a unit vector in the direction of w 


or 


If the angle between v and w 
is 0, then the magnitude of the 
projection of v onto w 

is ||v|| cos(@). 


] 


i iJu|| cos(9) " 
—“- 


v\ 


And vj; is this number times a unit vector in the direction of w: 


vew w vew 
vil 


~ [hwi| [hw] \wew 


In other words, if W = (1/||w||)w, then vj = (vew)w. 


The vector vj is called the orthogonal projection of v onto w. The nonzero vector w also 
determines a 1-dimensional subspace, denoted W, consisting of all multiples of w, and vj) is 


also known as the orthogonal projection onto the subspace W. 


Since v = vj, + Vv, we have 


ViFVvV—V|\. 


Example: Let 


1 1 
v=| =1 |, andw=/] 0 
2 1 


a2 
(vew)w = 0 
af 2 
Then 
1 3/2 —1/2 
ViFv-Vv= —l = 0 = —l 
2 aie 1/2 
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and you can easily check that vjjev,_ = 0. 


Remark: Suppose that, in the above, 7/2 < @ < 7, so the angle is not acute. In this case, 
cos@ is negative, and cos@||v|| is not the length of vj (since it’s negative, it can’t be a 
length). It has to be interpreted as a signed length, since the correct projection points in the 
opposite direction from that of w. In other words, the formula is correct, no matter what 


the value of 0. 


Exercise: This refers to the pendulum figure. Suppose the mass is located at (x,y) € R?. 


Find the unit vector parallel to the direction of the rod, say r, and a unit vector orthogonal 
to Tf, say 6, obtained by rotating r counterclockwise through an angle 7/2. Express these 
orthonormal vectors in terms of the angle 6. And show that Fo = —mg sin@ as claimed 


above. 
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19 Orthogonal projections and Gram-Schmidt 


19.1 Orthogonal matrices 


Suppose we take an orthonormal (0.n.) basis {e;, €2,...,¢,} of R” and form the nxn matrix 


E = (e,:---:e,). Then 


(ey:---:e,) = In, 


because 


t t 
(E'E)ij = eye; = ere; = 543, 


where 6;; are the components of the identity matrix: 


Since E’E = I, this means that EF’ = E7!. 
Definition: A square matrix E such that E* = E~! is called an orthogonal matriz. 


Example: 


1/V2 1//2 
iy?) \ i 


is an o.n. basis for R?. The corresponding matrix 


{e1,e2} = 


E = (1/¥V2) ' 


L =) 


is easily verified to be orthogonal. Of course the identity matrix is also orthogonal. As a 
converse to the above, if EF is an orthogonal matrix, the columns of FE’ form an o.n. basis of 


R”, 
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Exercises: 


If EF is orthogonal, so is E', so the rows of F also form an o.n. basis. 


If E and F are orthogonal and of the same dimension, then E’F' is orthogonal. 


e Let 
cos 0 — sind 


{e1(8), e2()} = 


sind | cos 0 
Let R(@) = (e1(0):e2(6)). Show that R(6)R(r) = R(A4+7). 


If EF and F are the two orthogonal matrices corresponding to two o.n. bases, then 


F = EP, where P is the change of basis matrix from E to F. Show that P is also 


orthogonal. 


19.2 Construction of orthonormal bases 


It is not obvious that any subspace V of IR” has an orthonormal basis, but it’s true. Here 


we give an algorithm for constructing such a basis, starting from an arbitrary basis. This 
is called the Gram-Schmidt procedure. We'll do it first for a 2-dimensional subspace of R®, 


and then do it in general at the end: 


Let V be a 2-dimensional subspace of R®, and let {f,,f,} be a basis for V. We want to 


construct an o.n. basis {e;, e2} for V. 


e The first step is easy. We define e; = Teh by normalizing f;. 


e We now need a vector orthogonal to e; which lies in the plane spanned by f; and fy. 
We get this by decomposing fy into vectors which are parallel to and orthogonal to e;: 


we have fy, = (foe; Jey, and oan = fy _ tbe 


e We now normalize this to get e2 = (1/||fo, ||) fo, - 
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e Since fy, is orthogonal to e;, so is eg. Moreover 


ff; 
ae (ag ty 


so fz, and hence eg are linear combinations of f; and f. Therefore, e; and eg span the 


same space and give an orthonormal basis for V. 


Example: Let V be the subspace of R? spanned by 


2 1 
{V1, V2} = 1 ; 2 
il 0 
Then ||v1|| = V6, so 
2 
e, = (1/v6)} 1 
1 
And 
1 2 —1 
Vo, = Vo—(vore,)ex = | 2 | — (2/3) ] 1 | = (1/3) 4 
0 1 —2 
Normalizing, we find 
—1 
e, = (1/2) | 4 
—2 


Exercise: Let £32 = {e1:e2}, where the columns are the orthonormal basis vectors found 


above. What is EE? What is HE‘? Is E an orthogonal matrix? Why or why not? 
Exercise: Find an orthonormal basis for the null space of the 1 x 3 matrix A = (1, —2,4). 


Exercise: Let {v1, v2,...,V%} be a set of (non-zero) orthogonal vectors. Prove that the set 
is linearly independent. (Hint: suppose that some linear combination is zero and show that 


all the coefficients must vanish.) Did you really need this hint? 
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19.3. Orthogonal projection onto a subspace V 


Suppose V C R” is a subspace, and suppose that {e),e2,...,,e,} is an orthonormal basis 


for V. For any vector x € R”, we define 


k 


Ty (x) = S_(xe,)e;. 


i=1 
Ily(x) is called the orthogonal projection of x onto V. This is the natural generalization 
to higher dimensions of the projection of x onto a one-dimensional space considered before. 
Notice what we’re doing: we’re projecting x onto each of the 1-dimensional spaces determined 


by the basis vectors and then adding them all up. 


Example: Let V be the column space of the matrix 


As we found above, an orthonormal basis for V is given by 


2 —1l 
{e1,e2} = 4 (1/v6)} 1 |,(/v21)] 4 
1 —2 
So if x = (1,2,3)¢, 
IIy(x) = (xse1)e1 + (xee2)e2 

= (7/V6)e, + (1/V21)es 
2 —1 
= (7/6)| 1 }+(1/21)] 4 
1 2 


Exercises: 


e Show that the function Ily : R” — V is a linear transformation. 
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e (Extra credit): Normally we don’t define geometric objects by using a basis. When we 
do, as in the case of Ily(x), we need to show that the concept is well-defined. In this 
case, we need to show that Ily(x) is the same, no matter which orthonormal basis in 


V is used. 


1. Suppose that {e),...,e,} and {€),...,e,} are two bases for V. Then €; = 


> P,;e; for some k x k matrix P. Show that P is an orthogonal matrix. 


2. Use this result to show that )°(x-e;)e; = )>(x*é;)e;, so that Ily(x) is independent 
of the basis. 


19.4 Orthogonal complements 


Definition. V+ = {x € R" such that xev = 0, for allv € V} is called the orthogonal 


complement of V in R”. 


Exercise: V+ is a subspace of R”. 


Example: Let 


V = span il 


1 
Then 
1 £ 
Vt = ¢v eR? such that ve | 1 | =0$= y such that z+y+z2=0 
1 z 


This is the same as the null space of the matrix A = (1,1,1). (Isn’t it?). So writing 


s=y, t= z, we have 


—s—t —l —l 
Vo s = 5 1 | +t 0 ],5,¢ ER 
t 0 1 


A basis for V* is clearly given by the two indicated vectors; of course, it’s not orthonormal, 


but we could remedy that if we wanted. 


Exercises: 


1. Let {wi, W2,...,w} be a basis for W. Show that v € Wt <> vw; = 0, Vi. 


2. Let 
1 I 
W = span Oe) =f 
1 2 


Find a basis for W+. Hint: Use the result of exercise 1 to get a system of two equations 


in two unknowns and solve it. 


19.5 Gram-Schmidt - the general algorithm 


Let V be a subspace of R”, and {vj, v2,..., Vm} an arbitrary basis for V. We construct an 


orthonormal basis out of this as follows: 


1. e; = Vj (recall that this means we normalize v, so that it has length 1. Let W, be the 


subspace span{e; }. 


2. Take f, = vo — II, (v2); then let e. = f. Let W2 = span{ej, eo}. 


Wy 


3. Now assuming that W;, has been constructed, we define, recursively 


feo. = Vez — My. (Viti), Ck+1 = fe4i1, and Wi41 = span{er,..., ee41}. 
4. Continue until W,,, has been defined. Then {e1,...,¢,,} is an orthonormal set in V, 


hence linearly independent, and thus a basis, since there are m vectors in the set. 
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20 Approximations - the method of least squares (1) 
In many applications, we have to consider the following problem: 


Suppose that for some y, the equation Ax = y has no solutions. It could be 
that this is an important problem and we can’t just forget about it. We could 
try to find an approximate solution. But which one? Suppose we choose an x 
at random. Then Ax 4 y. In choosing this x, we’ll make an error e = Ax — y. 
A reasonable choice (not the only one) is to seek an x with the property that 
||Ax — y||, the magnitude of the error, is as small as possible. (If this error is 
0, then we have an exact solution, so it seems like a reasonable thing to try and 


minimize it.) Since this is a bit abstract, we can look at a familiar example: 


Example: Suppose we have a bunch of data in the form of ordered pairs: 

{(%1, Y1), (2, Y2), +++, (Ln, Yn) }. These data might come from an experiment; for instance, 2; 
might be the current through some device and y; might be the temperature of the device while 
the given current flows through it. The n data points correspond to n different experimental 


observations. 


The problem is to ”fit” a straight line to this data. Another way to put this is: find the 
linear model that ” best” predicts y, given x. Clearly, this is a problem which has no exact 
solution unless all the data points are collinear - there’s no single line which goes through all 
the points. So how do we choose? The problem is to find m and b such that y = mz + 6 is, 
in some sense, the best possible fit. The first thing to do is to convince ourselves that this 


problem is a special case of finding an approximate solution to Ax = y: 
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Suppose we fix m and b. If the resulting line were a perfect fit, we’d have 


Y= mx+b 
Yo = mar+b 
Yr = MX + b. 
Put 
Yi x 1 
y My 1 m 
y= : , AS ° ; and x= 
Ye Ls 1 


Then the linear system above takes the form y = Ax, where A and y are known, and the 


problem is that there is no solution x = (m, b)’. 


20.1 The method of least squares 


We can visualize the problem geometrically. Think of the matrix A as defining a linear 


function f4 : R” — R™”. The range of f, is a subspace of 


R™, and the source of our problem 


is that y ¢ Range(f,4). If we pick an arbitrary point Ax € Range(f,4), then the error we’ve 


made is e = Ax — y. We want to choose Ax so that |Je|| is as small as possible. 


Exercise: This could clearly be handled as a calculus problem. How? 


Instead of using calculus, we can do something simpler. We decompose the error as e = 


e + e,, where e) € Range(f4) and e, € Range(f,4)~. See the figure on the next page. 


Then |le||? = |ley||? + |lex||? (by Pythagoras’ theorem!). Changing our choice of Ax does 


not change e,, so the only variable at our disposal is e). We can make this 0 by choosing 


Ax so that II(y) = Ax, where II is the orthogonal projection of R™ onto the range of f4. 


And this is the answer to our question. Instead of solving Ax = y, which is impossible, we 


solve for x in the equation Ax = II(y), which is guaranteed to have a solution. So we have 
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Figure 2: The plane is the range of f4. To minimize ||e||, 


we make e), = 0 by choosing x so that Ax = IIy(y). So 


Ax is the unlabeled vector from 0 to the foot of e,. 


minimized the squared length of the error e, thus the name least squares approximation. We 


collect this information in a 


Definition: The vector x is said to be a least squares solution to Ax = y if the error vector 


e = Ax — y is orthogonal to the range of f,4. 


Example (cont’d.): Note: We’re writing this down to demonstrate that we could, if we had 
to, find the least squares solution by solving Ax = II(y) directly. But this is not what’s 
done in practice, as we’ll see in the next lecture. In particular, this is not an efficient way to 


proceed. 


That having been said, let’s use what we now know to find the line which best fits the data 
points. (This line is called the least squares regression line, and you’ve probably encountered 


it before.) We have to project y into the range of f,4), where 


Ly 1 
Gx. Hop) 1 
La a 
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To do this, we need an orthonormal basis for the range of f4, which is the same as the column 
space of the matrix A. We apply the Gram-Schmidt process to the columns of A, starting 


with the easy one: 


1 

1 1 
e, = —_ 
1 Jn 

1 


If we write v for the first column of A, we now need to compute 
Vi =Vv—(vee;)ey 


A routine computation (exercise!) gives 


C— & 
LQ — 2 _ 1 7 
vVi= , where = — } Lv; 
7 = 
oS be 


is the mean or average value of the x-measurements. Then 


Lyra f 
= nm 
i tq —- x 2 
Se , where 0? = y (x; — z)” 
Mee = 
Ln — x 


is the variance of the x-measurements. Its square root, o, is called the standard deviation 


of the measurements. 


We can now compute 


II(y) = (yrer)er + (yre2)er 


= routine computation here ... 


1 i= 2 
_ 1 1 : _. Log —2£ 
= 4% +f Dann} 
ee = ie 
1 Ln — 2 
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For simplicity, let 
nw 2 
a= 3 {Sonu — naa} 
i=1 


Then the system of equations Ax = II(y) reads 


ma+b = axrt+y-—az 
mtz+b = axroty-—azx 
Mintb = at,+y— az, 


and we know (why?) that the augmented matrix for this system has rank 2. So we can 
solve for m and 6 just using the first two equations, assuming x; # X2 so these two are not 


multiples of one another. Subtracting the second from the first gives 
mx, — 22) = a(x, — 22), Or M=a. 
Now substituting a for m in either equation gives 
b=y- az. 


These are the formulas your graphing calculator uses to compute the slope and y-intercept 


of the regression line. 


This is also about the simplest possible least squares computation we can imagine, and it’s 
much too complicated to be of any practical use. Fortunately, there’s a much easier way to 


do the computation, which is the subject of the next lecture. 
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21 Least squares approximation - I] 


21.1 The transpose of A 


In the next section we’ll develop a equation, known as the normal equation, which is much 
easier to solve than Ax = II(y), and which also gives the correct x. Of course, we need a 


bit of background first. 


The transpose of a matrix, which we haven’t made much use of until now, begins to play a 


more important role once the dot product has been introduced. If A is an m xn matrix, then 


as you know, it can be regarded as a linear transformation from R” to R™. Its transpose, 


A' then gives a linear transformation from R™ to R", since it’s n x m. Note that there is no 


implication here that A‘ = A~! — the matrices needn’t be square, and even if they are, they 


need not be invertible. But A and A’ are related by the dot product: 
Theorem: x-A'y = Axey 


Proof: (Notice that the dot product on the left is in R”, while the one on the right is in R™.) 


The proof is a ” straightforward” computation: 


Axyy = SO" (Ax)iy; 
= jer Olean Age) vs 
now we reverse the order of summation to get 


= ie X oo Ajy;) 


t 


and since Aj; = Aj,, 


= Vie xilA'y); 
— xeAly 


we get 


What this says in plain English: we can ”move” A from one side of the dot product to the 
other by replacing it with A‘. So for instance, if Axey = 0, then xeA’y = 0, and conversely. 


In fact, pushing this a bit, we get an important 


Theorem: Ker(A‘) = (Range(A))+. 
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Proof: Let y € (Range(A))+. This means that for all x € R", Axey = 0. But by the 


previous theorem, this means that x-A'y = 0 for all x € R”. But any vector in R” which is 


orthogonal to everything must be the zero vector. So A'y = 0 and y € Ker(A’). Conversely, 


if y € Ker(A‘), then for any x € R”,x-A¥y = 0. And again by the theorem, this means that 


Axey = 0 for all such x, which means that y | Range(A). 


We have shown that (Range(A))+ C Ker(A‘), and conversely, that Ker(A‘) C (Range(A))+. 


So the two sets are equal. 


21.2 Least squares approximations — the Normal equation 


Now we’re ready to take up the least squares problem again. Recall that the problem is 
to solve Ax = H(y). where y has been projected orthogonally on to the range of A. The 
problem with solving this, as you'll recall, is that finding the projection II is tedious. And 


now we’ll see that it’s not necessary. 


We write y = I(y) +y, where y, is orthogonal to the range of A. Now suppose that 
x is a solution to the least squares problem Ax = I(y). Multiply this equation by A‘ to 
get A'Ax = A'II(y). So x is certainly also a solution to this. But now we notice that, in 


consequence of the previous theorem, 
Aly = A‘(II(y) + y1) = A'TM(y), 
since Aty, = 0. (It’s orthogonal to the range, so the theorem says it’s in Ker(A‘).) 


So x is also a solution to the normal equation 


A'Ax = Aty. 


Conversely, if x is a solution to the normal equation, then 
A‘(Ax _ y) = 0, 
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and by the previous theorem, this means that Ax — y is orthogonal to the range of A. But 
Ax —y is the error made using an approximate solution, and this shows that the error vector 


is orthogonal to the range of A — this is our definition of the least squares solution! 


The reason for all this fooling around is simple: we can compute A‘y by doing a simple 
matrix multiplication. We don’t need to find an orthonormal basis for the range of A to 


compute II. We summarize the results: 


Theorem: x is a least-squares solution to Ax = y <> x is a solution to the normal 


equation A‘Ax = Aty. 
Example: Find the least squares regression line through the 4 points (1, 2), (2,3), (—1, 1), (0,1). 


Solution: We’ve already set up this problem in the last lecture. We have 


1 1 2 
21 3 m 
A= v= , and = 
—-1 1 1 b 
0 1 1 
We compute 
6 2 7 
AA= , Ay = 
2 4 7 
And the solution to the normal equation is 
or 4 —2 7 7/10 
x = (A‘A) “Ay = (1/20) = 
—2 6 7 7/5 


So the regression line has the equation y = (7/10)a + 7/5. 


Exercises: 


1. For these problems, think of the row space as the column space of A’. Show that v is 
in the row space of A <= + v = A’y for some y. This means that the row space of 
A is the range of f,, (analogous to the fact that the column space of A is the range of 


f,). 
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2. Show that the null space of A is the orthogonal complement of the row space. (Hint: 


use the above theorem with A’ instead of A.) 
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